- Beautiful Soup Tutorial
- Beautiful Soup - Home
- Beautiful Soup - Overview
- Beautiful Soup - Web Scraping
- Beautiful Soup - Installation
- Beautiful Soup - Souping the Page
- Beautiful Soup - Kinds of objects
- Beautiful Soup - Inspect Data Source
- Beautiful Soup - Scrape HTML Content
- Beautiful Soup - Navigating by Tags
- Beautiful Soup - Find Elements by ID
- Beautiful Soup - Find Elements by Class
- Beautiful Soup - Find Elements by Attribute
- Beautiful Soup - Searching the Tree
- Beautiful Soup - Modifying the Tree
- Beautiful Soup - Parsing a Section of a Document
- Beautiful Soup - Find all Children of an Element
- Beautiful Soup - Find Element using CSS Selectors
- Beautiful Soup - Find all Comments
- Beautiful Soup - Scraping List from HTML
- Beautiful Soup - Scraping Paragraphs from HTML
- BeautifulSoup - Scraping Link from HTML
- Beautiful Soup - Get all HTML Tags
- Beautiful Soup - Get Text Inside Tag
- Beautiful Soup - Find all Headings
- Beautiful Soup - Extract Title Tag
- Beautiful Soup - Extract Email IDs
- Beautiful Soup - Scrape Nested Tags
- Beautiful Soup - Parsing Tables
- Beautiful Soup - Selecting nth Child
- Beautiful Soup - Search by text inside a Tag
- Beautiful Soup - Remove HTML Tags
- Beautiful Soup - Remove all Styles
- Beautiful Soup - Remove all Scripts
- Beautiful Soup - Remove Empty Tags
- Beautiful Soup - Remove Child Elements
- Beautiful Soup - find vs find_all
- Beautiful Soup - Specifying the Parser
- Beautiful Soup - Comparing Objects
- Beautiful Soup - Copying Objects
- Beautiful Soup - Get Tag Position
- Beautiful Soup - Encoding
- Beautiful Soup - Output Formatting
- Beautiful Soup - Pretty Printing
- Beautiful Soup - NavigableString Class
- Beautiful Soup - Convert Object to String
- Beautiful Soup - Convert HTML to Text
- Beautiful Soup - Parsing XML
- Beautiful Soup - Error Handling
- Beautiful Soup - Trouble Shooting
- Beautiful Soup - Porting Old Code
- Beautiful Soup - Functions Reference
- Beautiful Soup - contents Property
- Beautiful Soup - children Property
- Beautiful Soup - string Property
- Beautiful Soup - strings Property
- Beautiful Soup - stripped_strings Property
- Beautiful Soup - descendants Property
- Beautiful Soup - parent Property
- Beautiful Soup - parents Property
- Beautiful Soup - next_sibling Property
- Beautiful Soup - previous_sibling Property
- Beautiful Soup - next_siblings Property
- Beautiful Soup - previous_siblings Property
- Beautiful Soup - next_element Property
- Beautiful Soup - previous_element Property
- Beautiful Soup - next_elements Property
- Beautiful Soup - previous_elements Property
- Beautiful Soup - find Method
- Beautiful Soup - find_all Method
- Beautiful Soup - find_parents Method
- Beautiful Soup - find_parent Method
- Beautiful Soup - find_next_siblings Method
- Beautiful Soup - find_next_sibling Method
- Beautiful Soup - find_previous_siblings Method
- Beautiful Soup - find_previous_sibling Method
- Beautiful Soup - find_all_next Method
- Beautiful Soup - find_next Method
- Beautiful Soup - find_all_previous Method
- Beautiful Soup - find_previous Method
- Beautiful Soup - select Method
- Beautiful Soup - append Method
- Beautiful Soup - extend Method
- Beautiful Soup - NavigableString Method
- Beautiful Soup - new_tag Method
- Beautiful Soup - insert Method
- Beautiful Soup - insert_before Method
- Beautiful Soup - insert_after Method
- Beautiful Soup - clear Method
- Beautiful Soup - extract Method
- Beautiful Soup - decompose Method
- Beautiful Soup - replace_with Method
- Beautiful Soup - wrap Method
- Beautiful Soup - unwrap Method
- Beautiful Soup - smooth Method
- Beautiful Soup - prettify Method
- Beautiful Soup - encode Method
- Beautiful Soup - decode Method
- Beautiful Soup - get_text Method
- Beautiful Soup - diagnose Method
- Beautiful Soup Useful Resources
- Beautiful Soup - Quick Guide
- Beautiful Soup - Useful Resources
- Beautiful Soup - Discussion
Beautiful Soup - Modifying the Tree
One of the powerful features of Beautiful Soup library is to be able to be able to manipulate the parsed HTML or XML document and modify its contents.
Beautiful Soup library has different functions to perform the following operations −
Add contents or a new tag to an existing tag of the document
Insert contents before or after an existing tag or string
Clear the contents of an already existing tag
Modify the contents of a tag element
Add content
You can add to the content of an existing tag by using append() method on a Tag object. It works like the append() method of Python's list object.
In the following example, the HTML script has a <p> tag. With append(), additional text is appended.
Example
from bs4 import BeautifulSoup markup = '<p>Hello</p>' soup = BeautifulSoup(markup, 'html.parser') print (soup) tag = soup.p tag.append(" World") print (soup)
Output
<p>Hello</p> <p>Hello World</p>
With the append() method, you can add a new tag at the end of an existing tag. First create a new Tag object with new_tag() method and then pass it to the append() method.
Example
from bs4 import BeautifulSoup, Tag markup = '<b>Hello</b>' soup = BeautifulSoup(markup, 'html.parser') tag = soup.b tag1 = soup.new_tag('i') tag1.string = 'World' tag.append(tag1) print (soup.prettify())
Output
<b> Hello <i> World </i> </b>
If you have to add a string to the document, you can append a NavigableString object.
Example
from bs4 import BeautifulSoup, NavigableString markup = '<b>Hello</b>' soup = BeautifulSoup(markup, 'html.parser') tag = soup.b new_string = NavigableString(" World") tag.append(new_string) print (soup.prettify())
Output
<b> Hello World </b>
From Beautiful Soup version 4.7 onwards, the extend() method has been added to Tag class. It adds all the elements in a list to the tag.
Example
from bs4 import BeautifulSoup markup = '<b>Hello</b>' soup = BeautifulSoup(markup, 'html.parser') tag = soup.b vals = ['World.', 'Welcome to ', 'TutorialsPoint'] tag.extend(vals) print (soup.prettify())
Output
<b> Hello World. Welcome to TutorialsPoint </b>
Insert Contents
Instead of adding a new element at the end, you can use insert() method to add an element at the given position in a the list of children of a Tag element. The insert() method in Beautiful Soup behaves similar to insert() on a Python list object.
In the following example, a new string is added to the <b> tag at position 1. The resultant parsed document shows the result.
Example
from bs4 import BeautifulSoup, NavigableString markup = '<b>Excellent </b><u>from TutorialsPoint</u>' soup = BeautifulSoup(markup, 'html.parser') tag = soup.b tag.insert(1, "Tutorial ") print (soup.prettify())
Output
<b> Excellent Tutorial </b> <u> from TutorialsPoint </u>
Beautiful Soup also has insert_before() and insert_after() methods. Their respective purpose is to insert a tag or a string before or after a given Tag object. The following code shows that a string "Python Tutorial" is added after the <b> tag.
Example
from bs4 import BeautifulSoup, NavigableString markup = '<b>Excellent </b><u>from TutorialsPoint</u>' soup = BeautifulSoup(markup, 'html.parser') tag = soup.b tag.insert_after("Python Tutorial") print (soup.prettify())
Output
<b> Excellent </b> Python Tutorial <u> from TutorialsPoint </u>
On the other hand, insert_before() method is used below, to add "Here is an " text before the <b> tag.
tag.insert_before("Here is an ") print (soup.prettify())
Output
Here is an <b> Excellent </b> Python Tutorial <u> from TutorialsPoint </u>
Clear the Contents
Beautiful Soup provides more than one ways to remove contents of an element from the document tree. Each of these methods has its unique features.
The clear() method is the most straight-forward. It simply removes the contents of a specified Tag element. Following example shows its usage.
Example
from bs4 import BeautifulSoup, NavigableString markup = '<b>Excellent </b><u>from TutorialsPoint</u>' soup = BeautifulSoup(markup, 'html.parser') tag = soup.find('u') tag.clear() print (soup.prettify())
Output
<b> Excellent </b> <u> </u>
It can be seen that the clear() method removes the contents, keeping the tag intact.
For the following example, we parse the following HTML document and call clear() metho on all tags.
<html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs./p> </body> </html>
Here is the Python code using clear() method
Example
from bs4 import BeautifulSoup fp = open('index.html') soup = BeautifulSoup(fp, 'html.parser') tags = soup.find_all() for tag in tags: tag.clear() print (soup.prettify())
Output
<html> </html>
The extract() method removes either a tag or a string from the document tree, and returns the object that was removed.
Example
from bs4 import BeautifulSoup fp = open('index.html') soup = BeautifulSoup(fp, 'html.parser') tags = soup.find_all() for tag in tags: obj = tag.extract() print ("Extracted:",obj) print (soup)
Output
Extracted: <html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> </html> Extracted: <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> Extracted: <p> The quick, brown fox jumps over a lazy dog.</p> Extracted: <p> DJs flock by when MTV ax quiz prog.</p> Extracted: <p> Junk MTV quiz graced by fox whelps.</p> Extracted: <p> Bawds jog, flick quartz, vex nymphs.</p>
You can extract either a tag or a string. The following example shows antag being extracted.
Example
html = ''' <ol id="HR"> <li>Rani</li> <li>Ankita</li> </ol> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') obj=soup.find('ol') obj.find_next().extract() print (soup)
Output
<ol id="HR"> <li>Ankita</li> </ol>
Change the extract() statement to remove inner text of first <li> element.
Example
obj.find_next().string.extract()
Output
<ol id="HR"> <li>Ankita</li> </ol>
There is another method decompose() that removes a tag from the tree, then completely destroys it and its contents −
Example
html = ''' <ol id="HR"> <li>Rani</li> <li>Ankita</li> </ol> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag1=soup.find('ol') tag2 = soup.find('li') tag2.decompose() print (soup) print (tag2.decomposed)
Output
<ol id="HR"> <li>Ankita</li> </ol>
The decomposed property returns True or False - whether an element has been decomposed or not.
Modify the Contents
We shall look at the replace_with() method that allows contents of a tag to be replaced.
Just as a Python string, which is immutable, the NavigableString also can't be modified in place. However, use replace_with() to replace the inner string of a tag with another.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<h2 id='message'>Hello, Tutorialspoint!</h2>",'html.parser') tag = soup.h2 tag.string.replace_with("OnLine Tutorials Library") print (tag.string)
Output
OnLine Tutorials Library
Here is another example to show the use of replace_with(). Two parsed documents can be combined if you pass a BeautifulSoup object as an argument to a certain function such as replace_with().2524
Example
from bs4 import BeautifulSoup obj1 = BeautifulSoup("<book><title>Python</title></book>", features="xml") obj2 = BeautifulSoup("<b>Beautiful Soup parser</b>", "lxml") obj2.find('b').replace_with(obj1) print (obj2)
Output
<html><body><book><title>Python</title></book></body></html>
The wrap() method wraps an element in the tag you specify. It returns the new wrapper.
from bs4 import BeautifulSoup soup = BeautifulSoup("<p>Hello Python</p>", 'html.parser') tag = soup.p newtag = soup.new_tag('b') tag.string.wrap(newtag) print (soup)
Output
<p><b>Hello Python</b></p>
On the other hand, the unwrap() method replaces a tag with whatever's inside that tag. It's good for stripping out markup.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p>Hello <b>Python</b></p>", 'html.parser') tag = soup.p tag.b.unwrap() print (soup)
Output
<p>Hello Python</p>