Scrapy - Logging



Description

Logging means tracking of events, which uses built-in logging system and defines functions and classes to implement applications and libraries. Logging is a ready-to-use material, which can work with Scrapy settings listed in Logging settings.

Scrapy will set some default settings and handle those settings with the help of scrapy.utils.log.configure_logging() when running commands.

Log levels

In Python, there are five different levels of severity on a log message. The following list shows the standard log messages in an ascending order −

  • logging.DEBUG − for debugging messages (lowest severity)

  • logging.INFO − for informational messages

  • logging.WARNING − for warning messages

  • logging.ERROR − for regular errors

  • logging.CRITICAL − for critical errors (highest severity)

How to Log Messages

The following code shows logging a message using logging.info level.

import logging 
logging.info("This is an information")

The above logging message can be passed as an argument using logging.log shown as follows −

import logging 
logging.log(logging.INFO, "This is an information")

Now, you can also use loggers to enclose the message using the logging helpers logging to get the logging message clearly shown as follows −

import logging
logger = logging.getLogger()
logger.info("This is an information")

There can be multiple loggers and those can be accessed by getting their names with the use of logging.getLogger function shown as follows.

import logging
logger = logging.getLogger('mycustomlogger')
logger.info("This is an information")

A customized logger can be used for any module using the __name__ variable which contains the module path shown as follows −

import logging
logger = logging.getLogger(__name__)
logger.info("This is an information")

Logging from Spiders

Every spider instance has a logger within it and can used as follows −

import scrapy 

class LogSpider(scrapy.Spider):  
   name = 'logspider' 
   start_urls = ['http://dmoz.com']  
   def parse(self, response): 
      self.logger.info('Parse function called on %s', response.url)

In the above code, the logger is created using the Spider’s name, but you can use any customized logger provided by Python as shown in the following code −

import logging
import scrapy

logger = logging.getLogger('customizedlogger')
class LogSpider(scrapy.Spider):
   name = 'logspider'
   start_urls = ['http://dmoz.com']

   def parse(self, response):
      logger.info('Parse function called on %s', response.url)

Logging Configuration

Loggers are not able to display messages sent by them on their own. So they require "handlers" for displaying those messages and handlers will be redirecting these messages to their respective destinations such as files, emails, and standard output.

Depending on the following settings, Scrapy will configure the handler for logger.

Logging Settings

The following settings are used to configure the logging −

  • The LOG_FILE and LOG_ENABLED decide the destination for log messages.

  • When you set the LOG_ENCODING to false, it won't display the log output messages.

  • The LOG_LEVEL will determine the severity order of the message; those messages with less severity will be filtered out.

  • The LOG_FORMAT and LOG_DATEFORMAT are used to specify the layouts for all messages.

  • When you set the LOG_STDOUT to true, all the standard output and error messages of your process will be redirected to log.

Command-line Options

Scrapy settings can be overridden by passing command-line arguments as shown in the following table −

Sr.No Command & Description
1

--logfile FILE

Overrides LOG_FILE

2

--loglevel/-L LEVEL

Overrides LOG_LEVEL

3

--nolog

Sets LOG_ENABLED to False

scrapy.utils.log module

This function can be used to initialize logging defaults for Scrapy.

scrapy.utils.log.configure_logging(settings = None, install_root_handler = True)

Sr.No Parameter & Description
1

settings (dict, None)

It creates and configures the handler for root logger. By default, it is None.

2

install_root_handler (bool)

It specifies to install root logging handler. By default, it is True.

The above function −

  • Routes warnings and twisted loggings through Python standard logging.
  • Assigns DEBUG to Scrapy and ERROR level to Twisted loggers.
  • Routes stdout to log, if LOG_STDOUT setting is true.

Default options can be overridden using the settings argument. When settings are not specified, then defaults are used. The handler can be created for root logger, when install_root_handler is set to true. If it is set to false, then there will not be any log output set. When using Scrapy commands, the configure_logging will be called automatically and it can run explicitly, while running the custom scripts.

To configure logging's output manually, you can use logging.basicConfig() shown as follows −

import logging 
from scrapy.utils.log import configure_logging  

configure_logging(install_root_handler = False) 
logging.basicConfig ( 
   filename = 'logging.txt', 
   format = '%(levelname)s: %(your_message)s', 
   level = logging.INFO 
)
Advertisements