Write a Python code to convert a given series into a dummy variable and drop any NaN values if they exist

Converting a Pandas Series into dummy variables creates binary columns for each unique value. The pd.get_dummies() function handles this conversion and can automatically drop NaN values by setting dummy_na=False.

Understanding Dummy Variables

Dummy variables are binary (0 or 1) columns that represent categorical data. For example, a "Gender" series with values "Male" and "Female" becomes two columns: "Male" and "Female", where 1 indicates the presence of that category.

Syntax

pd.get_dummies(data, dummy_na=False)

Parameters

The key parameter for handling NaN values ?

  • dummy_na=False : Excludes NaN values from dummy variable creation
  • dummy_na=True : Creates an additional column for NaN values

Example

Let's create a series with gender data including NaN values and convert it to dummy variables ?

import pandas as pd
import numpy as np

gender = ['Male', 'Female', 'Male', 'Female', 'Male', np.nan, 'Female', 'Female']
series = pd.Series(gender)
print("Original Series:")
print(series)
Original Series:
0      Male
1    Female
2      Male
3    Female
4      Male
5       NaN
6    Female
7    Female
dtype: object

Converting to Dummy Variables

Now let's convert the series to dummy variables with NaN values dropped ?

import pandas as pd
import numpy as np

gender = ['Male', 'Female', 'Male', 'Female', 'Male', np.nan, 'Female', 'Female']
series = pd.Series(gender)

dummy_vars = pd.get_dummies(series, dummy_na=False)
print("Dummy Variables:")
print(dummy_vars)
Dummy Variables:
   Female  Male
0       0     1
1       1     0
2       0     1
3       1     0
4       0     1
5       0     0
6       1     0
7       1     0

Comparison: With and Without NaN Column

Here's how the output differs when including NaN as a dummy variable ?

import pandas as pd
import numpy as np

gender = ['Male', 'Female', 'Male', np.nan, 'Female']
series = pd.Series(gender)

print("Without NaN column (dummy_na=False):")
print(pd.get_dummies(series, dummy_na=False))

print("\nWith NaN column (dummy_na=True):")
print(pd.get_dummies(series, dummy_na=True))
Without NaN column (dummy_na=False):
   Female  Male
0       0     1
1       1     0
2       0     1
3       0     0
4       1     0

With NaN column (dummy_na=True):
   Female  Male  NaN
0       0     1    0
1       1     0    0
2       0     1    0
3       0     0    1
4       1     0    0

Key Points

Parameter Behavior Use Case
dummy_na=False NaN values result in all zeros When NaN represents missing data to ignore
dummy_na=True Creates separate column for NaN When NaN is a meaningful category

Conclusion

Use pd.get_dummies(series, dummy_na=False) to convert categorical series into binary dummy variables while automatically dropping NaN values. This is essential for preparing categorical data for machine learning algorithms.

Updated on: 2026-03-25T16:34:50+05:30

180 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements