Beautiful Soup - prettify() Method



Method Description

To get a nicely formatted Unicode string, use Beautiful Soup's prettify() method. It formats the Beautiful Soup parse tree so that there each tag is on its own separate line with indentation. It allows to you to easily visualize the structure of the Beautiful Soup parse tree.

Syntax

prettify(encoding, formatter)

Parameters

  • encoding − The eventual encoding of the string. If this is None, a Unicode string will be returned.

  • A Formatter object, or a string naming one of the standard formatters.

Return Type

The prettify() method returns a Unicode string (if encoding==None) or a bytestring (otherwise).

Example 1

Consider the following HTML string.

<p>The quick, <b>brown fox</b> jumps over a lazy dog.</p>

Using the prettify() method we can better understand its structure −

html = '''
<p>The quick, <b>brown fox</b> jumps over a lazy dog.</p>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")
print (soup.prettify())

Output

<html>
   <body>
      <p>
         The quick,
      <b>
         brown fox
      </b>
         jumps over a lazy dog.
      </p>
   </body>
</html>

Example 2

You can call prettify() on on any of the Tag objects in the document.

print (soup.b.prettify())

Output

<b>
   brown fox
</b>

The prettify() method is for understanding the structure of the document. However, it should not be used to reformat it, as it adds whitespace (in the form of newlines), and changes the meaning of an HTML document.

He prettify() method can optionally be provided formatter argument to specify the formatting to be used.

There are following possible values for the formatter.

formatter="minimal" − This is the default. Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML.

formatter="html" − Beautiful Soup will convert Unicode characters to HTML entities whenever possible.

formatter="html5" − it's similar to formatter="html", but Beautiful Soup will omit the closing slash in HTML void tags like "br".

formatter=None − Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML.

Example 3

from bs4 import BeautifulSoup

french = "<p>Il a dit <<Sacré bleu!>></p>"
soup = BeautifulSoup(french, 'html.parser')
print ("minimal: ")
print(soup.prettify(formatter="minimal"))
print ("html: ")
print(soup.prettify(formatter="html"))
print ("None: ")
print(soup.prettify(formatter=None))

Output

minimal: 
<p>
 Il a dit <
 <sacré bleu!="">
  >
 </sacré>
</p>
html: 
<p>
 Il a dit <
 <sacré bleu!="">
  >
 </sacré>
</p>
None: 
<p>
 Il a dit <
 <sacré bleu!="">
  >
 </sacré>
</p>
Advertisements