Article Categories

Selected Reading

How can data be imported to predict the fuel efficiency with Auto MPG dataset (basic regression) using TensorFlow?

Python Server Side Programming Programming

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. It is used in research and for production purposes.

The Auto MPG dataset contains fuel efficiency data of 1970s and 1980s automobiles. It includes attributes like weight, horsepower, displacement, and acceleration. With this dataset, we can predict the fuel efficiency of specific vehicles using regression techniques.

Installing TensorFlow

The tensorflow package can be installed using the following command:

pip install tensorflow

Understanding the Dataset

The Auto MPG dataset is a classic regression dataset containing the following features:

MPG: Miles per gallon (target variable)
Cylinders: Number of cylinders in the engine
Displacement: Engine displacement
Horsepower: Engine power
Weight: Vehicle weight
Acceleration: Time to accelerate from 0 to 60 mph
Model Year: Manufacturing year
Origin: Country of origin

Loading and Exploring the Dataset

Let's import the necessary libraries and load the Auto MPG dataset from the UCI repository:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Set numpy print options
np.set_printoptions(precision=3, suppress=True)

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print("TensorFlow version:", tf.__version__)

# Load the dataset
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 
                'Acceleration', 'Model Year', 'Origin']

print("Loading the Auto MPG dataset...")
raw_dataset = pd.read_csv(url, names=column_names, na_values='?', 
                         comment='\t', sep=' ', skipinitialspace=True)

dataset = raw_dataset.copy()
print("Dataset loaded successfully!")
print("\nFirst 5 rows of the dataset:")
print(dataset.head())

TensorFlow version: 2.4.0
Loading the Auto MPG dataset...
Dataset loaded successfully!

First 5 rows of the dataset:
    MPG  Cylinders  Displacement  Horsepower  Weight  Acceleration  Model Year  Origin
0  18.0          8         307.0       130.0  3504.0          12.0          70       1
1  15.0          8         350.0       165.0  3693.0          11.5          70       1
2  18.0          8         318.0       150.0  3436.0          11.0          70       1
3  16.0          8         304.0       150.0  3433.0          12.0          70       1
4  17.0          8         302.0       140.0  3449.0          10.5          70       1

Dataset Information

Let's examine the basic statistics and structure of our dataset:

# Check dataset info
print("Dataset shape:", dataset.shape)
print("\nDataset info:")
print(dataset.info())

print("\nBasic statistics:")
print(dataset.describe())

print("\nMissing values:")
print(dataset.isnull().sum())

Dataset shape: (398, 8)

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   MPG           398 non-null    float64
 1   Cylinders     398 non-null    int64  
 2   Displacement  398 non-null    float64
 3   Horsepower    392 non-null    float64
 4   Weight        398 non-null    float64
 5   Acceleration  398 non-null    float64
 6   Model Year    398 non-null    int64  
 7   Origin        398 non-null    int64  

Basic statistics:
            MPG   Cylinders  Displacement   Horsepower       Weight  Acceleration  Model Year      Origin
count   398.000     398.000       398.000      392.000      398.000       398.000     398.000     398.000
mean     23.446       5.472       194.412      104.469     2977.584        15.541      75.979       1.577
std       7.805       1.705       104.644       38.491      849.403         2.758       3.684       0.775

Missing values:
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0

Data Preprocessing

The dataset has some missing values in the Horsepower column. Let's handle these missing values:

# Remove rows with missing values
dataset_cleaned = dataset.dropna()

print("Dataset shape after removing missing values:", dataset_cleaned.shape)
print("Missing values after cleaning:", dataset_cleaned.isnull().sum().sum())

# Display correlation with target variable (MPG)
print("\nCorrelation with MPG:")
correlations = dataset_cleaned.corr()['MPG'].sort_values(ascending=False)
print(correlations)

Dataset shape after removing missing values: (392, 8)
Missing values after cleaning: 0

Correlation with MPG:
MPG             1.000000
Acceleration    0.423329
Model Year      0.580541
Origin          0.565209
Cylinders      -0.777618
Displacement   -0.805127
Weight         -0.832244
Horsepower     -0.778427

Key Insights

From the correlation analysis, we can observe:

Strong negative correlations: Weight (-0.83), Displacement (-0.81), and Horsepower (-0.78) have strong negative relationships with MPG
Positive correlations: Model Year (0.58) and Origin (0.57) show positive relationships with fuel efficiency
Moderate correlation: Acceleration (0.42) has a moderate positive correlation

Conclusion

The Auto MPG dataset has been successfully loaded and explored. The dataset contains 392 complete records with 8 features, where heavier cars with larger engines tend to have lower fuel efficiency. This dataset is now ready for building regression models to predict fuel efficiency using TensorFlow.

AmitDiwan

Updated on: 2026-03-25T15:36:37+05:30

538 Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next