- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow?
Tensorflow is a machine learning framework that is provided by Google. It is an open−source framework used in conjunction with Python to implement algorithms, deep learning applications and much more.
The ‘tensorflow’ package can be installed on Windows using the below line of code −
pip install tensorflow
Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but multidimensional array or a list.
The aim behind a regression problem is to predict the output of a continuous or discrete variable, such as a price, probability, whether it would rain or not and so on.
The dataset we use is called the ‘Auto MPG’ dataset. It contains fuel efficiency of 1970s and 1980s automobiles. It includes attributes like weight, horsepower, displacement, and so on. With this, we need to predict the fuel efficiency of specific vehicles.
We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.
Following is the code snippet wherein we will see how can data be cleaned to predict the fuel efficiency with Auto MPG dataset using TensorFlow −
Example
print("Data cleaning has begun") dataset.isna().sum() dataset = dataset.dropna() dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'}) print("Data cleaning complete!") dataset = pd.get_dummies(dataset, prefix='', prefix_sep='') print("A sample of dataset after data cleaning :") dataset.head(4)
Code credit − https://www.tensorflow.org/tutorials/keras/regression
Output
Data cleaning has begun Data cleaning complete! A sample of dataset after data cleaning −
MPG | Cylinders | Displacement | horsepower | weight | Acceleration | Model Year | Europe | Japan | USA | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504.0 | 12.0 | 70 | 0 | 0 | 1 |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693.0 | 11.5 | 70 | 0 | 0 | 1 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436.0 | 11.0 | 70 | 0 | 0 | 1 |
3 | 16.0 | 8 | 304.0 | 150.0 | 3433.0 | 12.0 | 70 | 0 | 0 | 1 |
Explanation
The data cleaning begins by deleting ‘nan’ present in the dataset.
The ‘map’ function is used to map label to column names.
A sample of the dataet after data cleaning is displayed on the console.