Beautiful Soup - find vs find_all



Beautiful Soup library includes find() as well as find_all() methods. Both methods are one of the most frequently used methods while parsing HTML or XML documents. From a particular document tree You often need to locate a PageElement of a certain tag type, or having certain attributes, or having a certain CSS style etc. These criteria are given as argument to both find() and find_all() methods. The main point of difference between the two is that while find() locates the very first child element that satisfies the criteria, find_all() method searches for all the children elements of the criteria.

The find() method is defined with following syntax −

Syntax

find(name, attrs, recursive, string, **kwargs)

The name argument specifies a filter on tag name. With attrs, a filter on tag attribute values can be set up. The recursive argument forces a recursive search if it is True. You can pass variable kwargs as dictionary of filters on attribute values.

soup.find(id = 'nm')
soup.find(attrs={"name":'marks'})

The find_all() method takes all the arguments as for the find() method, in addition there is a limit argument. It is an integer, restricting the search the specified number of occurrences of the given filter criteria. If not set, find_all() searches for the criteria among all the children under the said PageElement.

soup.find_all('input')
lst=soup.find_all('li', limit =2)

If the limit argument for find_all() method is set to 1, it virtually acts as find() method.

The return type of both the methods differs. The find() method returns either a Tag object or a NavigableString object first found. The find_all() method returns a ResultSet consisting of all the PageElements satisfying the filter criteria.

Here is an example that demonstrates the difference between find and find_all methods.

Example

from bs4 import BeautifulSoup

markup =open("index.html")

soup = BeautifulSoup(markup, 'html.parser')
ret1 = soup.find('input')
ret2 = soup.find_all ('input')
print (ret1, 'Return type of find:', type(ret1))
print (ret2)
print ('Return tyoe find_all:', type(ret2))

#set limit =1
ret3 = soup.find_all ('input', limit=1)
print ('find:', ret1)
print ('find_all:', ret3)

Output

<input id="nm" name="name" type="text"/> Return type of find: <class 'bs4.element.Tag'>
[<input id="nm" name="name" type="text"/>, <input id="age" name="age" type="text"/>, <input id="marks" name="marks" type="text"/>]
Return tyoe find_all: <class 'bs4.element.ResultSet'>
find: <input id="nm" name="name" type="text"/>
find_all: [<input id="nm" name="name" type="text"/>]
Advertisements