Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Percentile Rank of a Column in a Pandas DataFrame
The percentile rank shows what percentage of values in a dataset are less than or equal to a given value. In pandas, we can calculate percentile ranks using the rank() method or scipy's percentileofscore() function.
What is Percentile Rank?
If a student scores in the 80th percentile, it means their score is greater than or equal to 80% of all other scores in the dataset.
Using rank() Method
The most common approach is using pandas' rank() method with pct=True parameter ?
import pandas as pd
# Create sample DataFrame
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
'Score': [55, 92, 68, 70, 88]}
df = pd.DataFrame(data)
# Calculate percentile rank
df['Percentile_Rank'] = df['Score'].rank(pct=True)
print(df)
Name Score Percentile_Rank
0 Ram 55 0.2
1 Shyam 92 1.0
2 Shrey 68 0.4
3 Mohan 70 0.6
4 Navya 88 0.8
Converting to Percentage
To express percentile ranks as percentages, multiply by 100 ?
import pandas as pd
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
'Score': [55, 92, 68, 70, 88]}
df = pd.DataFrame(data)
# Calculate percentile rank as percentage
df['Percentile_Rank'] = (df['Score'].rank(pct=True) * 100).round(1)
print(df)
Name Score Percentile_Rank
0 Ram 55 20.0
1 Shyam 92 100.0
2 Shrey 68 40.0
3 Mohan 70 60.0
4 Navya 88 80.0
Using scipy.stats.percentileofscore
For more control over the calculation method, use scipy's percentileofscore() ?
import pandas as pd
from scipy.stats import percentileofscore
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
'Balance': [5500, 9200, 6800, 7000, 8800]}
df = pd.DataFrame(data)
# Calculate percentile rank using scipy
df['Percentile_Rank'] = df['Balance'].apply(
lambda x: percentileofscore(df['Balance'], x)
)
print(df)
Name Balance Percentile_Rank
0 Ram 5500 20.0
1 Shyam 9200 100.0
2 Shrey 6800 40.0
3 Mohan 7000 60.0
4 Navya 8800 80.0
Comparison of Methods
| Method | Output Range | Best For |
|---|---|---|
rank(pct=True) |
0.0 to 1.0 | Simple percentile ranks |
percentileofscore() |
0 to 100 | More control over calculation |
Conclusion
Use rank(pct=True) for simple percentile rank calculations in pandas. For more advanced options or percentage output, use scipy's percentileofscore() function.
