Beautiful Soup - encode() Method



Method Description

The encode() method in Beautiful Soup renders a bytestring representation of the given PageElement and its contents.

The prettify() method, which allows to you to easily visualize the structure of the Beautiful Soup parse tree, has the encoding argument. The encode() method plays the same role as the encoding in prettify() method has.

Syntax

encode(encoding, indent_level, formatter, errors)

Parameters

  • encoding − The destination encoding.

  • indent_level − Each line of the rendering will be

  • indented this many levels. Used internally in recursive calls while pretty-printing.

  • formatter − A Formatter object, or a string naming one of the standard formatters.

  • errors − An error handling strategy.

Return Value

The encode() method returns a byte string representation of the tag and its contents.

Example 1

The encoding parameter is utf-8 by default. Following code shows the encoded byte string representation of the soup object.

from bs4 import BeautifulSoup

soup = BeautifulSoup("Hello “World!”", 'html.parser')
print (soup.encode('utf-8'))

Output

b'Hello \xe2\x80\x9cWorld!\xe2\x80\x9d'

Example 2

The formatter object has the following predefined values −

formatter="minimal" − This is the default. Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML.

formatter="html" − Beautiful Soup will convert Unicode characters to HTML entities whenever possible.

formatter="html5" − it's similar to formatter="html", but Beautiful Soup will omit the closing slash in HTML void tags like "br".

formatter=None − Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML.

In the following example, different formatter values are used as argument for encode() method.

from bs4 import BeautifulSoup

french = "<p>Il a dit <<Sacré bleu!>></p>"
soup = BeautifulSoup(french, 'html.parser')
print ("minimal: ")
print(soup.p.encode(formatter="minimal"))
print ("html: ")
print(soup.p.encode(formatter="html"))
print ("None: ")
print(soup.p.encode(formatter=None))

Output

minimal: 
b'<p>Il a dit <<Sacr\xc3\xa9 bleu!>></p>'
html:
b'<p>Il a dit <<Sacré bleu!>></p>'
None:
b'<p>Il a dit <<Sacr\xc3\xa9 bleu!>></p>'

Example 3

The following example uses Latin-1 as the encoding parameter.

markup = '''
<html>
   <head>
      <meta content="text/html; charset=ISO-Latin-1" http-equiv="Content-type" />
   </head>
   <body>
      <p>Sacr`e bleu!</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(markup, 'lxml')
print(soup.p.encode("latin-1"))

Output

b'<p>Sacr`e bleu!</p>'
Advertisements