Beautiful Soup - NavigableString Class



One of the main objects prevalent in Beautiful Soup API is the object of NavigableString class. It represents the string or text between the opening and closing counterparts of most of the HTML tags. For example, if <b>Hello</b> is the markup to be parsed, Hello is the NavigableString.

NavigableString class is subclassed from the PageElement class in bs4 package, as well as Python's built-in str class. Hence, it inherits the PageElement methods such as find_*(), insert, append, wrap,unwrap methods as well as methods from str class such as upper, lower, find, isalpha etc.

The constructor of this class takes a single argument, a str object.

Example

from bs4 import NavigableString
new_str = NavigableString('world')

You can now use this NavigableString object to perform all kinds of operations on the parsed tree, such as append, insert, find etc.

In the following example, we append the newly created NavigableString object to an existing Tab object.

Example

from bs4 import BeautifulSoup, NavigableString

markup = '<b>Hello</b>'
soup = BeautifulSoup(markup, 'html.parser')

tag = soup.b 
new_str = NavigableString('world')
tag.append(new_str)
print (soup)

Output

<b>Helloworld</b>

Note that the NavigableString is a PageElement, hence it can be appended to the Soup object also. Check the difference if we do so.

Example

new_str = NavigableString('world')
soup.append(new_str)
print (soup)

Output

<b>Hello</b>world

As we can see, the string appears after the <b> tag.

Beautiful Soup offers a new_string() method. Create a new NavigableString associated with this BeautifulSoup object.

Let us new_string() method to create a NavigableString object, and add it to the PageElements.

Example

from bs4 import BeautifulSoup, NavigableString

markup = '<b>Hello</b>'
soup = BeautifulSoup(markup, 'html.parser')

tag = soup.b 

ns=soup.new_string(' World')
tag.append(ns)
print (tag)
soup.append(ns)
print (soup)

Output

<b>Hello World</b>
<b>Hello</b> World

We find an interesting behaviour here. The NavigableString object is added to a tag inside the tree, as well as to the soup object itself. While the tag shows the appended string, but in the soup object, the text World is appended, but it doesn't show in the tag. This is because the new_string() method creates a NavigableString associated with the Soup object.

Advertisements