Scrapy - Overview



Scrapy is a fast, open-source web crawling framework written in Python, used to extract the data from the web page with the help of selectors based on XPath.

Scrapy was first released on June 26, 2008 licensed under BSD, with a milestone 1.0 releasing in June 2015.

Why Use Scrapy?

  • It is easier to build and scale large crawling projects.

  • It has a built-in mechanism called Selectors, for extracting the data from websites.

  • It handles the requests asynchronously and it is fast.

  • It automatically adjusts crawling speed using Auto-throttling mechanism.

  • Ensures developer accessibility.

Features of Scrapy

  • Scrapy is an open source and free to use web crawling framework.

  • Scrapy generates feed exports in formats such as JSON, CSV, and XML.

  • Scrapy has built-in support for selecting and extracting data from sources either by XPath or CSS expressions.

  • Scrapy based on crawler, allows extracting data from the web pages automatically.

Advantages

  • Scrapy is easily extensible, fast, and powerful.

  • It is a cross-platform application framework (Windows, Linux, Mac OS and BSD).

  • Scrapy requests are scheduled and processed asynchronously.

  • Scrapy comes with built-in service called Scrapyd which allows to upload projects and control spiders using JSON web service.

  • It is possible to scrap any website, though that website does not have API for raw data access.

Disadvantages

  • Scrapy is only for Python 2.7. +

  • Installation is different for different operating systems.

Advertisements