Seaborn - Importing Datasets and Libraries



In this chapter, we will discuss how to import Datasets and Libraries. Let us begin by understanding how to import libraries.

Importing Libraries

Let us start by importing Pandas, which is a great library for managing relational (table-format) datasets. Seaborn comes handy when dealing with DataFrames, which is most widely used data structure for data analysis.

The following command will help you import Pandas −

# Pandas for managing datasets
import pandas as pd

Now, let us import the Matplotlib library, which helps us customize our plots.

# Matplotlib for additional customization
from matplotlib import pyplot as plt

We will import the Seaborn library with the following command −

# Seaborn for plotting and styling
import seaborn as sb

Importing Datasets

We have imported the required libraries. In this section, we will understand how to import the required datasets.

Seaborn comes with a few important datasets in the library. When Seaborn is installed, the datasets download automatically.

You can use any of these datasets for your learning. With the help of the following function you can load the required dataset

load_dataset()

Importing Data as Pandas DataFrame

In this section, we will import a dataset. This dataset loads as Pandas DataFrame by default. If there is any function in the Pandas DataFrame, it works on this DataFrame.

The following line of code will help you import the dataset −

# Seaborn for plotting and styling
import seaborn as sb
df = sb.load_dataset('tips')
print df.head()

The above line of code will generate the following output −

   total_bill  tip   sex    smoker day  time   size
0    16.99    1.01   Female  No    Sun  Dinner  2
1    10.34    1.66   Male    No    Sun  Dinner  3
2    21.01    3.50   Male    No    Sun  Dinner  3
3    23.68    3.31   Male    No    Sun  Dinner  2
4    24.59    3.61   Female  No    Sun  Dinner  4

To view all the available data sets in the Seaborn library, you can use the following command with the get_dataset_names() function as shown below −

import seaborn as sb
print sb.get_dataset_names()

The above line of code will return the list of datasets available as the following output

[u'anscombe', u'attention', u'brain_networks', u'car_crashes', u'dots', 
u'exercise', u'flights', u'fmri', u'gammas', u'iris', u'planets', u'tips', 
u'titanic']

DataFrames store data in the form of rectangular grids by which the data can be over viewed easily. Each row of the rectangular grid contains values of an instance, and each column of the grid is a vector which holds data for a specific variable. This means that rows of a DataFrame do not need to contain, values of same data type, they can be numeric, character, logical, etc. DataFrames for Python come with the Pandas library, and they are defined as two-dimensional labeled data structures with potentially different types of columns.

For more details on DataFrames, visit our tutorial on pandas.

Advertisements