How can Tensorflow be used in the conversion between different string representations?


The encoded string scalar can be converted to a vector of code points using the ‘decode’ method. The vector of code points can be converted to an encoded string scalar using the ‘encode’ method. The encoded string scalar can be converted to a different encoding using the ‘transcode’ method.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

Let us understand how to represent Unicode strings using Python, and manipulate those using Unicode equivalents. First, we separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string ops.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

print("Converting encoded string scalar to a vector of code points")
tf.strings.unicode_decode(text_utf8,input_encoding='UTF-8')
print("Converting vector of code points to an encoded string scalar")
tf.strings.unicode_encode(text_chars, output_encoding='UTF-8')
print("Converting encoded string scalar to a different encoding")
tf.strings.unicode_transcode(text_utf8, input_encoding='UTF8', output_encoding='UTF-16-BE')

Code credit: https://www.tensorflow.org/tutorials/load_data/unicode

Output

Converting encoded string scalar to a vector of code points
Converting vector of code points to an encoded string scalar
Converting encoded string scalar to a different encoding
<tf.Tensor: shape=(), dtype=string, numpy=b'\x8b\xed\x8a\x00Y\x04t\x06'>

Explanation

  • The function 'unicode_decode' is used to convert encoded string scalar to vector of code points.
  • The function 'unicode_encode' is used to convert vector of code points to an encoded string scalar.
  • The function 'unicode_transcode' is used to convert encoded string scalar to a different encoding.

Updated on: 19-Feb-2021

254 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements