- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to disallow duplicate labels in a Pandas DataFrame?
By default, Pandas allows duplicate labels in a DataFrame. However, it can cause issues because some Pandas methods won't work if there are duplicates in a DataFrame. In this article, we will see how to prevent Pandas from allowing duplicate labels or catch them at the compile time.
Example
Take a look at the following code. We have a DataFrame with two columns having the same column name "Name". Still, Pandas will produce the required output without any issues.
import pandas as pd df = pd.DataFrame( [ ['John', 89, 'Maths'], ['Jacob', 23, 'Physics'], ['Tom', 100, 'Chemistry']], columns=['Name', 'Name', 'Subjects']) print("Input DataFrame is:\n", df)
Output
It will produce the following output −
Input DataFrame is: Name Name Subjects 0 John 89 Maths 1 Jacob 23 Physics 2 Tom 100 Chemistry
Now, let's see how to prevent Pandas from allowing duplicate labels.
We can use .set_flags(allows_duplicate_labels=False) which will automatically detect the duplicate labels in the DataFrame and throw an error.
Now, run the same code with allows_duplicate_labels set to False and it will produce errors −
import pandas as pd df = pd.DataFrame( [ ['John', 89, 'Maths'], ['Jacob', 23, 'Physics'], ['Tom', 100, 'Chemistry']], columns=['Name', 'Name', 'Subjects'] ).set_flags(allows_duplicate_labels=False) print("Input DataFrame is:\n", df)
Now, it will catch the duplicate labels and produce the following error −
pandas.errors.DuplicateLabelError: Index has duplicates. label positions Name [0, 1]