- Scrapy Tutorial
- Scrapy - Home
- Scrapy Basic Concepts
- Scrapy - Overview
- Scrapy - Environment
- Scrapy - Command Line Tools
- Scrapy - Spiders
- Scrapy - Selectors
- Scrapy - Items
- Scrapy - Item Loaders
- Scrapy - Shell
- Scrapy - Item Pipeline
- Scrapy - Feed exports
- Scrapy - Requests & Responses
- Scrapy - Link Extractors
- Scrapy - Settings
- Scrapy - Exceptions
- Scrapy Live Project
- Scrapy - Create a Project
- Scrapy - Define an Item
- Scrapy - First Spider
- Scrapy - Crawling
- Scrapy - Extracting Items
- Scrapy - Using an Item
- Scrapy - Following Links
- Scrapy - Scraped Data
- Scrapy Built In Services
- Scrapy - Logging
- Scrapy - Stats Collection
- Scrapy - Sending an E-mail
- Scrapy - Telnet Console
- Scrapy - Web Services
- Scrapy Useful Resources
- Scrapy - Quick Guide
- Scrapy - Useful Resources
- Scrapy - Discussion
Scrapy - Other Settings
The following table shows other settings of Scrapy −
Sr.No | Setting & Description |
---|---|
1 | AJAXCRAWL_ENABLED It is used for enabling the large crawls. Default value: False |
2 | AUTOTHROTTLE_DEBUG It is enabled to see how throttling parameters are adjusted in real time, which displays stats on every received response. Default value: False |
3 | AUTOTHROTTLE_ENABLED It is used to enable AutoThrottle extension. Default value: False |
4 | AUTOTHROTTLE_MAX_DELAY It is used to set the maximum delay for download in case of high latencies. Default value: 60.0 |
5 | AUTOTHROTTLE_START_DELAY It is used to set the initial delay for download. Default value: 5.0 |
6 | AUTOTHROTTLE_TARGET_CONCURRENCY It defines the average number of requests for a Scrapy to send parallely to remote sites. Default value: 1.0 |
7 | CLOSESPIDER_ERRORCOUNT It defines total number of errors that should be recieved before the spider is closed. Default value: 0 |
8 | CLOSESPIDER_ITEMCOUNT It defines a total number of items before closing the spider. Default value: 0 |
9 | CLOSESPIDER_PAGECOUNT It defines the maximum number of responses to crawl before spider closes. Default value: 0 |
10 | CLOSESPIDER_TIMEOUT It defines the amount of time (in sec) for a spider to close. Default value: 0 |
11 | COMMANDS_MODULE It is used when you want to add custom commands in your project. Default value: '' |
12 | COMPRESSION_ENABLED It indicates that the compression middleware is enabled. Default value: True |
13 | COOKIES_DEBUG If set to true, all the cookies sent in requests and received in responses are logged. Default value: False |
14 | COOKIES_ENABLED It indicates that cookies middleware is enabled and sent to web servers. Default value: True |
15 | FILES_EXPIRES It defines the delay for the file expiration. Default value: 90 days |
16 | FILES_RESULT_FIELD It is set when you want to use other field names for your processed files. |
17 | FILES_STORE It is used to store the downloaded files by setting it to a valid value. |
18 | FILES_STORE_S3_ACL It is used to modify the ACL policy for the files stored in Amazon S3 bucket. Default value: private |
19 | FILES_URLS_FIELD It is set when you want to use other field name for your files URLs. |
20 | HTTPCACHE_ALWAYS_STORE Spider will cache the pages thoroughly if this setting is enabled. Default value: False |
21 | HTTPCACHE_DBM_MODULE It is a database module used in DBM storage backend. Default value: 'anydbm' |
22 | HTTPCACHE_DIR It is a directory used to enable and store the HTTP cache. Default value: 'httpcache' |
23 | HTTPCACHE_ENABLED It indicates that HTTP cache is enabled. Default value: False |
24 | HTTPCACHE_EXPIRATION_SECS It is used to set the expiration time for HTTP cache. Default value: 0 |
25 | HTTPCACHE_GZIP This setting if set to true, all the cached data will be compressed with gzip. Default value: False |
26 | HTTPCACHE_IGNORE_HTTP_CODES It states that HTTP responses should not be cached with HTTP codes. Default value: [] |
27 | HTTPCACHE_IGNORE_MISSING This setting if enabled, the requests will be ignored if not found in the cache. Default value: False |
28 | HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS It is a list containing cache controls to be ignored. Default value: [] |
29 | HTTPCACHE_IGNORE_SCHEME It states that HTTP responses should not be cached with URI schemes. Default value: ['file'] |
30 | HTTPCACHE_POLICY It defines a class implementing cache policy. Default value: 'scrapy.extensions.httpcache.DummyPolicy' |
31 | HTTPCACHE_STORAGE It is a class implementing the cache storage. Default value: 'scrapy.extensions.httpcache.FilesystemCacheStorage' |
32 | HTTPERROR_ALLOWED_CODES It is a list where all the responses are passed with non-200 status codes. Default value: [] |
33 | HTTPERROR_ALLOW_ALL This setting when enabled, all the responses are passed despite of its status codes. Default value: False |
34 | HTTPPROXY_AUTH_ENCODING It is used to authenticate the proxy on HttpProxyMiddleware. Default value: "latin-1" |
35 | IMAGES_EXPIRES It defines the delay for the images expiration. Default value: 90 days |
36 | IMAGES_MIN_HEIGHT It is used to drop images that are too small using minimum size. |
37 | IMAGES_MIN_WIDTH It is used to drop images that are too small using minimum size. |
38 | IMAGES_RESULT_FIELD It is set when you want to use other field name for your processed images. |
39 | IMAGES_STORE It is used to store the downloaded images by setting it to a valid value. |
40 | IMAGES_STORE_S3_ACL It is used to modify the ACL policy for the images stored in Amazon S3 bucket. Default value: private |
41 | IMAGES_THUMBS It is set to create the thumbnails of downloaded images. |
42 | IMAGES_URLS_FIELD It is set when you want to use other field name for your images URLs. |
43 | MAIL_FROM The sender uses this setting to send the emails. Default value: 'scrapy@localhost' |
44 | MAIL_HOST It is a SMTP host used to send emails. Default value: 'localhost' |
45 | MAIL_PASS It is a password used to authenticate SMTP. Default value: None |
46 | MAIL_PORT It is a SMTP port used to send emails. Default value: 25 |
47 | MAIL_SSL It is used to implement connection using SSL encrypted connection. Default value: False |
48 | MAIL_TLS When enabled, it forces connection using STARTTLS. Default value: False |
49 | MAIL_USER It defines a user to authenticate SMTP. Default value: None |
50 | METAREFRESH_ENABLED It indicates that meta refresh middleware is enabled. Default value: True |
51 | METAREFRESH_MAXDELAY It is a maximum delay for a meta-refresh to redirect. Default value: 100 |
52 | REDIRECT_ENABLED It indicates that the redirect middleware is enabled. Default value: True |
53 | REDIRECT_MAX_TIMES It defines the maximum number of times for a request to redirect. Default value: 20 |
54 | REFERER_ENABLED It indicates that referrer middleware is enabled. Default value: True |
55 | RETRY_ENABLED It indicates that the retry middleware is enabled. Default value: True |
56 | RETRY_HTTP_CODES It defines which HTTP codes are to be retried. Default value: [500, 502, 503, 504, 408] |
57 | RETRY_TIMES It defines maximum number of times for retry. Default value: 2 |
58 | TELNETCONSOLE_HOST It defines an interface on which the telnet console must listen. Default value: '127.0.0.1' |
59 | TELNETCONSOLE_PORT It defines a port to be used for telnet console. Default value: [6023, 6073] |