Scrapy - Stats Collection



Description

Stats Collector is a facility provided by Scrapy to collect the stats in the form of key/values and it is accessed using the Crawler API (Crawler provides access to all Scrapy core components). The stats collector provides one stats table per spider in which the stats collector opens automatically when spider is opening and closes the stats collector when spider is closed.

Common Stats Collector Uses

The following code accesses the stats collector using stats attribute.

class ExtensionThatAccessStats(object): 
   def __init__(self, stats): 
      self.stats = stats  
   
   @classmethod 
   def from_crawler(cls, crawler): 
      return cls(crawler.stats)

The following table shows various options can be used with stats collector −

Sr.No Parameters Description
1
stats.set_value('hostname', socket.gethostname())
It is used to set the stats value.
2
stats.inc_value('customized_count')
It increments the stat value.
3
stats.max_value('max_items_scraped', value)
You can set the stat value, only if greater than previous value.
4
stats.min_value('min_free_memory_percent', value)
You can set the stat value, only if lower than previous value.
5
stats.get_value('customized_count')
It fetches the stat value.
6
stats.get_stats() {'custom_count': 1, 'start_time': 
datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)} 
It fetches all the stats

Available Stats Collectors

Scrapy provides different types of stats collector which can be accessed using the STATS_CLASS setting.

MemoryStatsCollector

It is the default Stats collector that maintains the stats of every spider which was used for scraping and the data will be stored in the memory.

class scrapy.statscollectors.MemoryStatsCollector

DummyStatsCollector

This stats collector is very efficient which does nothing. This can be set using the STATS_CLASS setting and can be used to disable the stats collection in order to improve the performance.

class scrapy.statscollectors.DummyStatsCollector
Advertisements