How can Tensorflow be used to work with character substring in Python?


Character substrings can be used with Tensorflow using the ‘substr’ method which is present in ‘strings’ module of Tensorflow. It is then converted into a Numpy array and then displayed.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

We will see how to represent Unicode strings using Python, and manipulate those using Unicode equivalents. First, separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string ops.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

print("The default unit is byte")
print("When len is 1, a single byte is returned")
tf.strings.substr(thanks, pos=7, len=1).numpy()
print("The unit is specified as UTF8_CHAR")
print("It takes up 4 bytes")
print(tf.strings.substr(thanks, pos=7, len=1, unit='UTF8_CHAR').numpy())

Code credit: https://www.tensorflow.org/tutorials/load_data/unicode

Output

The default unit is byte
When len is 1, a single byte is returned
The unit is specified as UTF8_CHAR
It takes up 4 bytes
b''

Explanation

  • The tf.strings.substr operation takes the "unit" parameter.
  • It then uses this to determine the kind of offsets the "pos" and "len" paremeters would contain.

Updated on: 20-Feb-2021

124 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements