
- Python Pandas Tutorial
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
- Python Pandas Useful Resources
- Python Pandas - Quick Guide
- Python Pandas - Cheatsheet
- Python Pandas - Useful Resources
- Python Pandas - Discussion
Python Pandas - Concatenation
Concatenation in Pandas refers to the process of joining two or more Pandas objects (like DataFrames or Series) along a specified axis. This operation is very useful when you need to merge data from different sources or datasets.
The primary tool for this operation is pd.concat() function, which can useful for Series, DataFrame objects, whether you're combining rows or columns. Concatenation in Pandas involves combining multiple DataFrame or Series objects either row-wise or column-wise.
In this tutorial, we'll explore how to concatenate Pandas objects using the pd.concat() function. By discussing the different scenarios including concatenating along rows, using keys to distinguish concatenated DataFrames, ignoring indexes during concatenation, and concatenating along columns.
Understanding the pd.concat() Function
The pandas.concat() function is the primary method used for concatenation in Pandas. It allows you to concatenate pandas objects along a particular axis with various options for handling indexes.
The syntax of the pd.concat() functions as follows −
pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)
Where,
objs: This is a sequence or mapping of Series, DataFrame, or Panel objects.
axis: {0, 1, ...}, default 0. This is the axis to concatenate along.
join: {"inner", "outer"}, default "outer". How to handle indexes on other axis(es). Outer for union and inner for intersection.
ignore_index: boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1.
keys: Used to create a hierarchical index along the concatenation axis.
levels: Specific levels to use for the MultiIndex in the result.
names: Names for the levels in the resulting hierarchical index.
verify_integrity: If True, checks for duplicate entries in the new axis and raises an error if duplicates are found.
sort: When combining DataFrames with unaligned columns, this parameter ensures the columns are sorted.
copy: default None. If False, do not copy data unnecessarily.
The concat() function does all of the heavy lifting of performing concatenation operations along an axis. Let us create different objects and do concatenation.
Example: Concatenating DataFrames
In this example, the two DataFrames are concatenated along rows, with the resulting DataFrame having duplicated indices.
import pandas as pd # Creating two DataFrames one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) # Concatenating DataFrames result = pd.concat([one, two]) print(result)
Its output is as follows −
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
Example: Concatenating with Keys
If you want to distinguish between the concatenated DataFrames, you can use the keys parameter to associate specific keys with each part of the DataFrame.
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two],keys=['x','y']))
Its output is as follows −
Name subject_id Marks_scored x 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 y 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
Here, the x and y keys create a hierarchical index, allowing easy identification of which original DataFrame each row came from.
Example: Ignoring Indexes During Concatenation
If the resultant object has to follow its own indexing, set ignore_index to True.
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two],keys=['x','y'],ignore_index=True))
Its output is as follows −
Name subject_id Marks_scored 0 Alex sub1 98 1 Amy sub2 90 2 Allen sub4 87 3 Alice sub6 69 4 Ayoung sub5 78 5 Billy sub2 89 6 Brian sub4 80 7 Bran sub3 79 8 Bryce sub6 97 9 Betty sub5 88
Observe, the index changes completely and the Keys are also overridden.
Example: Concatenating Along Columns
Instead of concatenating along rows, you can concatenate along columns by setting the axis parameter to 1.
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two],axis=1))
Its output is as follows −
Name subject_id Marks_scored Name subject_id Marks_scored 1 Alex sub1 98 Billy sub2 89 2 Amy sub2 90 Brian sub4 80 3 Allen sub4 87 Bran sub3 79 4 Alice sub6 69 Bryce sub6 97 5 Ayoung sub5 78 Betty sub5 88