- Scrapy Tutorial
- Scrapy - Home
- Scrapy Basic Concepts
- Scrapy - Overview
- Scrapy - Environment
- Scrapy - Command Line Tools
- Scrapy - Spiders
- Scrapy - Selectors
- Scrapy - Items
- Scrapy - Item Loaders
- Scrapy - Shell
- Scrapy - Item Pipeline
- Scrapy - Feed exports
- Scrapy - Requests & Responses
- Scrapy - Link Extractors
- Scrapy - Settings
- Scrapy - Exceptions
- Scrapy Live Project
- Scrapy - Create a Project
- Scrapy - Define an Item
- Scrapy - First Spider
- Scrapy - Crawling
- Scrapy - Extracting Items
- Scrapy - Using an Item
- Scrapy - Following Links
- Scrapy - Scraped Data
- Scrapy Built In Services
- Scrapy - Logging
- Scrapy - Stats Collection
- Scrapy - Sending an E-mail
- Scrapy - Telnet Console
- Scrapy - Web Services
- Scrapy Useful Resources
- Scrapy - Quick Guide
- Scrapy - Useful Resources
- Scrapy - Discussion
Scrapy - Selectorlist Objects
Selector Examples on HTML Response
Following are some of the examples on HTMLResponse and we will have HTMLResponse object, which is instantiated with the selector, shown as follows −
res = Selector(html_response)
You can select the h2 elements from HTML response body, which returns the SelectorList object as −
>>res.xpath("//h2")
You can select the h2 elements from HTML response body, which returns the list of unicode strings as −
>>res.xpath("//h2").extract()
It returns the h2 elements.
and
>>res.xpath("//h2/text()").extract()
It returns the text defined under h2 tag and does not include h2 tag elements.
You can run through the p tags and display the class attribute as −
for ele in res.xpath("//p"): print ele.xpath("@class").extract()
Selector Examples on XML Response
Following are some of the examples on XMLResponse and we will have XMLResponse object, which is instantiated with the selector, shown as follows −
res = Selector(xml_response)
You can select the description elements from XML response body, which returns the SelectorList object as −
>>res.xpath("//description")
You can get the price value from the Google Base XML feed by registering a namespace as −
>>res.register_namespace("g", "http://base.google.com/ns/1.0") >>res.xpath("//g:price").extract()
Removing Namespaces
When you are creating the Scrapy projects, you can remove the namespaces using the Selector.remove_namespaces() method and use the element names to work appropriately with XPaths.
There are two reasons for not calling the namespace removal procedure always in the project −
You can remove the namespace which requires repeating the document and modifying the all elements that leads to expensive operation to crawl documents by Scrapy.
In some cases, you need to use namespaces and these may conflict with the some element names and namespaces. This type of case occurs very often.
To Continue Learning Please Login
Login with Google