Beautiful Soup - select() Method



Method Description

In Beautiful Soup library, the select() method is an important tool for scraping the HTML/XML document. Similar to find() and find_*() methods, the select() method also helps in locating an element that satisfies a given criteria. The selection of an element in the document tree is done based on the CSS selector given to it as an argument.

Beautiful Soup also has select_one() method. Difference in select() and select_one() is that, select() returns a ResultSet of all the elements belonging to the PageElement and characterized by the CSS selector; whereas select_one() returns the first occurrence of the element satisfying the CSS selector based selection criteria.

Prior to Beautiful Soup version 4.7, the select() method used to be able to support only the common CSS selectors. With version 4.7, Beautiful Soup was integrated with Soup Sieve CSS selector library. As a result, much more selectors can now be used. In the version 4.12, a .css property has been added in addition to the existing convenience methods, select() and select_one().

Syntax

select(selector, limit, **kwargs)

Parameters

  • selector − A string containing a CSS selector.

  • limit − After finding this number of results, stop looking.

  • kwargs − Keyword arguments to be passed.

If the limit parameter is set to 1, it becomes equivalent to select_one() method.

Return Value

The select() method returns a ResultSet of Tag objects. The select_one() method returns a single Tag object.

The Soup Sieve library has different types of CSS selectors. The basic CSS selectors are −

  • Type selectors match elements by node name. For example −

tags = soup.select('div')
  • The Universal selector (*) matches elements of any type. Example −

tags = soup.select('*')
  • The ID selector matches an element based on its id attribute. The symbol # denotes the ID selector. Example −

tags = soup.select("#nm")
  • The class selector matches an element based on the values contained in the class attribute. The . symbol prefixed to the class name is the CSS class selector. Example −

tags = soup.select(".submenu")

Example: Type Selector

from bs4 import BeautifulSoup, NavigableString

markup = '''
   <div id="Languages">
      <p>Java</p> <p>Python</p> <p>C++</p>
   </div>
'''
soup = BeautifulSoup(markup, 'html.parser')

tags = soup.select('div')
print (tags)

Output

[<div id="Languages">
<p>Java</p> <p>Python</p> <p>C++</p>
</div>]

Example: ID selector

from bs4 import BeautifulSoup

html = '''
   <form>
      <input type = 'text' id = 'nm' name = 'name'>
      <input type = 'text' id = 'age' name = 'age'>
      <input type = 'text' id = 'marks' name = 'marks'>
   </form>
'''
soup = BeautifulSoup(html, 'html.parser')
obj = soup.select("#nm")
print (obj)

Output

[<input id="nm" name="name" type="text"/>]

Example: class selector

html = '''
   <ul>
      <li class="mainmenu">Accounts</li>
      <ul>
         <li class="submenu">Anand</li>
         <li class="submenu">Mahesh</li>
      </ul>
      <li class="mainmenu">HR</li>
      <ul>
         <li class="submenu">Rani</li>
         <li class="submenu">Ankita</li>
      </ul>
   </ul>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
tags = soup.select(".mainmenu")
print (tags)

Output

[<li class="mainmenu">Accounts</li>, <li class="mainmenu">HR</li>]
Advertisements