Scrapy - Exceptions



Description

The irregular events are referred to as exceptions. In Scrapy, exceptions are raised due to reasons such as missing configuration, dropping item from the item pipeline, etc. Following is the list of exceptions mentioned in Scrapy and their application.

DropItem

Item Pipeline utilizes this exception to stop processing of the item at any stage. It can be written as −

exception (scrapy.exceptions.DropItem)

CloseSpider

This exception is used to stop the spider using the callback request. It can be written as −

exception (scrapy.exceptions.CloseSpider)(reason = 'cancelled')

It contains parameter called reason (str) which specifies the reason for closing.

For instance, the following code shows this exception usage −

def parse_page(self, response): 
   if 'Bandwidth exceeded' in response.body: 
      raise CloseSpider('bandwidth_exceeded') 

IgnoreRequest

This exception is used by scheduler or downloader middleware to ignore a request. It can be written as −

exception (scrapy.exceptions.IgnoreRequest)

NotConfigured

It indicates a missing configuration situation and should be raised in a component constructor.

exception (scrapy.exceptions.NotConfigured)

This exception can be raised, if any of the following components are disabled.

  • Extensions
  • Item pipelines
  • Downloader middlewares
  • Spider middlewares

NotSupported

This exception is raised when any feature or method is not supported. It can be written as −

exception (scrapy.exceptions.NotSupported)
Advertisements