How can Tensorflow and Estimator be used to define input function for training and evaluation of dataset?


Tensorflow and Estimator can be used to define input function for training and evaluation of dataset that generates a dictionary using the features, and labels. This is achieved using the ‘from_tensor_slices’ method. This function will also shuffle the data in the dataset, as well as define the number of training steps. In the end, this function returns the combined data about the dataset as output. This function is called by passing training dataset to it.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

We will use the Keras Sequential API, which is helpful in building a sequential model that is used to work with a plain stack of layers, where every layer has exactly one input tensor and one output tensor.

A neural network that contains at least one layer is known as a convolutional layer. We can use the Convolutional Neural Network to build learning model. 

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

Let us understand how Estimators can be used.

An Estimator is TensorFlow's high-level representation of a complete model. It is designed for easy scaling and asynchronous training.

We will train a logistic regression model using the tf.estimator API. The model is used as a baseline for other algorithms. We use the titanic dataset with the goal of predicting passenger survival, given characteristics such as gender, age, class, etc.

Estimators use feature columns to describe how the model would interpret the raw input features. An Estimator expects a vector of numeric inputs, and feature columns will help describe how the model should convert every feature in the dataset.

Example

print("The entire batch of dataset is used since it is small")
NUM_EXAMPLES = len(y_train)
print("Function that iterates through the dataset, in memory training")
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
if shuffle:
   dataset = dataset.shuffle(NUM_EXAMPLES)
   dataset = dataset.repeat(n_epochs)
   dataset = dataset.batch(NUM_EXAMPLES)
   return dataset
return input_fn
print("Training and evaluation input function have been defined")
train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dfeval, y_eval, shuffle=False, n_epochs=1)

Code credit −https://www.tensorflow.org/tutorials/estimator/boosted_trees

Output

The entire batch of dataset is used since it is small
Function that iterates through the dataset, in memory training
Training and evaluation input function have been defined

Explanation

  • The input functions need to be created.
  • It specifies how the data will be read into the model for training and inference.
  • The from_tensor_slices method in the tf.data API is used to read in data directly from Pandas library.
  • This is suited for smaller, in-memory datasets.
  • If the dataset is large, the tf.data API can be used, since it supports a variety of file formats.

Updated on: 25-Feb-2021

99 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements