Article Categories

Selected Reading

How to find the children of nodes using BeautifulSoup?

BeautifulSoup Python Server Side Programming Programming

BeautifulSoup is a popular Python library used for web scraping. It provides a simple and intuitive interface to parse HTML and XML documents, making it easy to extract useful information from them. In this tutorial, we will explore how to find children of nodes using BeautifulSoup.

Before we dive into the technical details, it is important to understand what "nodes" are in the context of HTML and XML documents. Nodes are the basic building blocks of these documents, and they represent different elements such as tags, attributes, text, comments, and so on.

Setting Up BeautifulSoup

To find children of nodes using BeautifulSoup, we first need to create a BeautifulSoup object from the HTML document we want to parse ?

from bs4 import BeautifulSoup

html_doc = """
<html>
<head>
<title>Example</title>
</head>
<body>
<div class="content">
<h1>Heading</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())

<html>
 <head>
  <title>
   Example
  </title>
 </head>
 <body>
  <div class="content">
   <h1>
    Heading
   </h1>
   <p>
    Paragraph 1
   </p>
   <p>
    Paragraph 2
   </p>
  </div>
 </body>
</html>

Using find() and find_all() Methods

The find() method searches for the first occurrence of a tag, while find_all() returns all matching elements ?

from bs4 import BeautifulSoup

html_doc = """
<div class="content">
<h1>Heading</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
div = soup.find('div', {'class': 'content'})
paragraphs = div.find_all('p')

for p in paragraphs:
    print(p.text)

Paragraph 1
Paragraph 2

Using the children Property

The children property returns an iterator over all direct children of a node ?

from bs4 import BeautifulSoup

html_doc = """
<div class="content">
<h1>Heading</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
div = soup.find('div', {'class': 'content'})

for child in div.children:
    if child.name:  # Skip whitespace text nodes
        print(child)

<h1>Heading</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>

Using the descendants Property

The descendants property iterates over all descendants, including children, grandchildren, and so on ?

from bs4 import BeautifulSoup

html_doc = """
<div class="content">
<h1>Heading</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
div = soup.find('div', {'class': 'content'})

for descendant in div.descendants:
    if descendant.name:  # Skip whitespace text nodes
        print(f"Tag: {descendant.name}")
    elif descendant.strip():  # Print non-empty text content
        print(f"Text: {descendant.strip()}")

Tag: h1
Text: Heading
Tag: p
Text: Paragraph 1
Tag: p
Text: Paragraph 2

Finding Next Sibling

Use find_next_sibling() to find the next sibling element that matches criteria ?

from bs4 import BeautifulSoup

html_doc = """
<div class="content">
<h1>Heading</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
div = soup.find('div', {'class': 'content'})
first_p = div.find('p')
next_p = first_p.find_next_sibling('p')

print(f"First paragraph: {first_p.text}")
print(f"Next paragraph: {next_p.text}")

First paragraph: Paragraph 1
Next paragraph: Paragraph 2

Using CSS Selectors

CSS selectors provide a powerful way to find elements using the select() method ?

from bs4 import BeautifulSoup

html_doc = """
<div class="content">
<h1>Heading</h1>
<p class="intro">Introduction paragraph</p>
<p>Regular paragraph</p>
<a href="https://example.com">External link</a>
<a href="/internal">Internal link</a>
</div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Select all paragraphs within div
paragraphs = soup.select('div p')
print("All paragraphs:")
for p in paragraphs:
    print(f"  {p.text}")

print("\nExternal links:")
# Select links with href starting with 'https://'
external_links = soup.select('a[href^="https://"]')
for link in external_links:
    print(f"  {link.text} -> {link['href']}")

All paragraphs:
  Introduction paragraph
  Regular paragraph

External links:
  External link -> https://example.com

Comparison of Methods

Method	Returns	Best For
`find_all()`	List of elements	Finding all matching children
`children`	Iterator of direct children	Iterating through immediate children
`descendants`	Iterator of all descendants	Deep traversal of nested elements
`select()`	List of elements	Complex CSS-based selection

Conclusion

BeautifulSoup provides multiple methods to find children of nodes: find_all() for matching elements, children for direct children, descendants for all descendants, and select() for CSS-based selection. Choose the method that best fits your specific parsing needs.

Gaurav Leekha

Updated on: 2026-03-27T16:37:42+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next