Write a program in Python to find which column has the minimum number of missing values in a given dataframe

When working with data analysis, it's common to encounter missing values in DataFrames. Sometimes you need to identify which column has the minimum number of missing values to help guide your data cleaning strategy.

Problem Statement

Given a DataFrame with missing values, we need to find which column has the fewest NaN values. This is useful for determining which columns are most complete in your dataset.

Sample DataFrame

Let's start by creating a sample DataFrame with missing values to demonstrate the solution ?

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Id': [1, 2, 3, np.nan, 5, 6, 7, 8, 9, 10],
    'Salary': [20000, np.nan, 50000, 40000, 80000, np.nan, 350000, 55000, 60000, 70000],
    'Age': [22, 23, np.nan, 25, np.nan, 25, 26, 27, np.nan, 24]
})

print("DataFrame is:")
print(df)
DataFrame is:
     Id    Salary   Age
0   1.0   20000.0  22.0
1   2.0       NaN  23.0
2   3.0   50000.0   NaN
3   NaN   40000.0  25.0
4   5.0   80000.0   NaN
5   6.0       NaN  25.0
6   7.0  350000.0  26.0
7   8.0   55000.0  27.0
8   9.0   60000.0   NaN
9  10.0   70000.0  24.0

Solution Approach

To find the column with minimum missing values, we'll follow these steps ?

  • Count the number of missing values in each column using isnull().sum()

  • Find the column name with the minimum count using idxmin()

Method 1: Using isnull() and idxmin()

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Id': [1, 2, 3, np.nan, 5, 6, 7, 8, 9, 10],
    'Salary': [20000, np.nan, 50000, 40000, 80000, np.nan, 350000, 55000, 60000, 70000],
    'Age': [22, 23, np.nan, 25, np.nan, 25, 26, 27, np.nan, 24]
})

print("DataFrame is:")
print(df)

# Count missing values per column
missing_counts = df.isnull().sum()
print("\nMissing values per column:")
print(missing_counts)

# Find column with minimum missing values
min_missing_column = missing_counts.idxmin()
print(f"\nColumn with minimum missing values: {min_missing_column}")
DataFrame is:
     Id    Salary   Age
0   1.0   20000.0  22.0
1   2.0       NaN  23.0
2   3.0   50000.0   NaN
3   NaN   40000.0  25.0
4   5.0   80000.0   NaN
5   6.0       NaN  25.0
6   7.0  350000.0  26.0
7   8.0   55000.0  27.0
8   9.0   60000.0   NaN
9  10.0   70000.0  24.0

Missing values per column:
Id        1
Salary    2
Age       3
dtype: int64

Column with minimum missing values: Id

Method 2: Using apply() with Lambda

Alternative approach using apply() function with a lambda expression ?

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Id': [1, 2, 3, np.nan, 5, 6, 7, 8, 9, 10],
    'Salary': [20000, np.nan, 50000, 40000, 80000, np.nan, 350000, 55000, 60000, 70000],
    'Age': [22, 23, np.nan, 25, np.nan, 25, 26, 27, np.nan, 24]
})

# Using apply with lambda
missing_counts = df.apply(lambda x: x.isnull().sum(), axis=0)
min_missing_column = missing_counts.idxmin()

print(f"Column with minimum missing values: {min_missing_column}")
print(f"Number of missing values: {missing_counts[min_missing_column]}")
Column with minimum missing values: Id
Number of missing values: 1

Complete Analysis

Here's a comprehensive function that provides detailed analysis of missing values ?

import pandas as pd
import numpy as np

def analyze_missing_values(df):
    missing_counts = df.isnull().sum()
    missing_percentages = (missing_counts / len(df)) * 100
    
    # Create summary DataFrame
    summary = pd.DataFrame({
        'Missing_Count': missing_counts,
        'Missing_Percentage': missing_percentages.round(2)
    })
    
    print("Missing Values Summary:")
    print(summary)
    
    min_column = missing_counts.idxmin()
    min_count = missing_counts[min_column]
    
    print(f"\nColumn with minimum missing values: '{min_column}' ({min_count} missing)")
    
    return min_column

# Test the function
df = pd.DataFrame({
    'Id': [1, 2, 3, np.nan, 5, 6, 7, 8, 9, 10],
    'Salary': [20000, np.nan, 50000, 40000, 80000, np.nan, 350000, 55000, 60000, 70000],
    'Age': [22, 23, np.nan, 25, np.nan, 25, 26, 27, np.nan, 24]
})

result = analyze_missing_values(df)
Missing Values Summary:
        Missing_Count  Missing_Percentage
Id                  1               10.0
Salary              2               20.0
Age                 3               30.0

Column with minimum missing values: 'Id' (1 missing)

Key Points

  • isnull().sum() counts missing values per column

  • idxmin() returns the index (column name) with minimum value

  • Both approaches yield the same result; choose based on preference

  • Consider analyzing missing percentages for better insights

Conclusion

Use df.isnull().sum().idxmin() to quickly find the column with minimum missing values. This approach helps identify the most complete columns in your dataset for further analysis.

Updated on: 2026-03-25T16:28:57+05:30

432 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements