
- Python Pandas Tutorial
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
- Python Pandas Useful Resources
- Python Pandas - Quick Guide
- Python Pandas - Cheatsheet
- Python Pandas - Useful Resources
- Python Pandas - Discussion
Python Pandas - Basic Functionality
Pandas is a powerful data manipulation library in Python, providing essential tools to work with data in both Series and DataFrame formats. These two data structures are crucial for handling and analyzing large datasets.
Understanding the basic functionalities of Pandas, including its attributes and methods, is essential for effectively managing data, these attributes and methods provide valuable insights into your data, making it easier to understand and process. In this tutorial you will learn about the basic attributes and methods in Pandas that are crucial for working with these data structures.
Working with Attributes in Pandas
Attributes in Pandas allow you to access metadata about your Series and DataFrame objects. By using these attributes you can explore and easily understand the data.
Series and DataFrame Attributes
Following are the widely used attribute of the both Series and DataFrame objects −
Sr.No. | Attribute & Description |
---|---|
1 |
dtype Returns the data type of the elements in the Series or DataFrame. |
2 |
index Provides the index (row labels) of the Series or DataFrame. |
3 |
values Returns the data in the Series or DataFrame as a NumPy array. |
4 |
shape Returns a tuple representing the dimensionality of the DataFrame (rows, columns). |
5 |
ndim Returns the number of dimensions of the object. Series is always 1D, and DataFrame is 2D. |
6 |
size Gives the total number of elements in the object. |
7 |
empty Checks if the object is empty, and returns True if it is. |
8 |
columns Provides the column labels of the DataFrame object. |
Example
Let's create a Pandas Series and explore these attributes operation.
import pandas as pd import numpy as np # Create a Series with random numbers s = pd.Series(np.random.randn(4)) # Exploring attributes print("Data type of Series:", s.dtype) print("Index of Series:", s.index) print("Values of Series:", s.values) print("Shape of Series:", s.shape) print("Number of dimensions of Series:", s.ndim) print("Size of Series:", s.size) print("Is Series empty?:", s.empty)
Its output is as follows −
Data type of Series: float64 Index of Series: RangeIndex(start=0, stop=4, step=1) Values of Series: [-1.02016329 1.40840089 1.36293022 1.33091391] Shape of Series: (4,) Number of dimensions of Series: 1 Size of Series: 4 Is Series empty?: False
Example
Let's look at below example and understand working of these attributes on a DataFrame object.
import pandas as pd import numpy as np # Create a DataFrame with random numbers df = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD')) print("DataFrame:") print(df) print("Results:") print("Data types:", df.dtypes) print("Index:", df.index) print("Columns:", df.columns) print("Values:") print(df.values) print("Shape:", df.shape) print("Number of dimensions:", df.ndim) print("Size:", df.size) print("Is empty:", df.empty)
On executing the above code you will get the following output −
DataFrame: A B C D 0 2.161209 -1.671807 -1.020421 -0.287065 1 0.308136 -0.592368 -0.183193 1.354921 2 -0.963498 -1.768054 -0.395023 -2.454112 Results: Data types: A float64 B float64 C float64 D float64 dtype: object Index: RangeIndex(start=0, stop=3, step=1) Columns: Index(['A', 'B', 'C', 'D'], dtype='object') Values: [[ 2.16120893 -1.67180742 -1.02042138 -0.28706468] [ 0.30813618 -0.59236786 -0.18319262 1.35492058] [-0.96349817 -1.76805364 -0.3950226 -2.45411245]] Shape: (3, 4) Number of dimensions: 2 Size: 12 Is empty: False
Exploring Basic Methods in Pandas
Pandas offers several basic methods in both the data structures, that makes it easy to quickly look at and understand your data. These methods help you get a summary and explore the details without much effort.
Series and DataFrame Methods
Sr.No. | Method & Description |
---|---|
1 |
head(n) Returns the first n rows of the object. The default value of n is 5. |
2 |
tail(n) Returns the last n rows of the object. The default value of n is 5. |
3 |
info() Provides a concise summary of a DataFrame, including the index dtype and column dtypes, non-null values, and memory usage. |
4 |
describe() Generates descriptive statistics of the DataFrame or Series, such as count, mean, std, min, and max. |
Example
Let us now create a Series and see the working of the Series basic methods.
import pandas as pd import numpy as np # Create a Series with random numbers s = pd.Series(np.random.randn(10)) print("Series:") print(s) # Using basic methods print("First 5 elements of the Series:\n", s.head()) print("\nLast 3 elements of the Series:\n", s.tail(3)) print("\nDescriptive statistics of the Series:\n", s.describe())
Its output is as follows −
Series: 0 -0.295898 1 -0.786081 2 -1.189834 3 -0.410830 4 -0.997866 5 0.084868 6 0.736541 7 0.133949 8 1.023674 9 0.669520 dtype: float64 First 5 elements of the Series: 0 -0.295898 1 -0.786081 2 -1.189834 3 -0.410830 4 -0.997866 dtype: float64 Last 3 elements of the Series: 7 0.133949 8 1.023674 9 0.669520 dtype: float64 Descriptive statistics of the Series: count 10.000000 mean -0.103196 std 0.763254 min -1.189834 25% -0.692268 50% -0.105515 75% 0.535627 max 1.023674 dtype: float64
Example
Now look at below example and understand working of the basic methods on a DataFrame object.
import pandas as pd import numpy as np #Create a Dictionary of series data = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} #Create a DataFrame df = pd.DataFrame(data) print("Our data frame is:\n") print(df) # Using basic methods print("\nFirst 5 rows of the DataFrame:\n", df.head()) print("\nLast 3 rows of the DataFrame:\n", df.tail(3)) print("\nInfo of the DataFrame:") df.info() print("\nDescriptive statistics of the DataFrame:\n", df.describe())
On executing the above code you will get the following output −
Our data frame is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Smith 29 4.60 6 Jack 23 3.80 First 5 rows of the DataFrame: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 Last 3 rows of the DataFrame: Name Age Rating 4 Steve 30 3.2 5 Smith 29 4.6 6 Jack 23 3.8 Info of the DataFrame: <class 'pandas.core.frame.DataFrame'> RangeIndex: 7 entries, 0 to 6 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7 non-null object 1 Age 7 non-null int64 2 Rating 7 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 296.0+ bytes Descriptive statistics of the DataFrame: Age Rating count 7.000000 7.000000 mean 25.857143 3.658571 std 2.734262 0.698628 min 23.000000 2.560000 25% 24.000000 3.220000 50% 25.000000 3.800000 75% 27.500000 4.105000 max 30.000000 4.600000