Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a Python code to convert a given series into a dummy variable and drop any NaN values if they exist
Converting a Pandas Series into dummy variables creates binary columns for each unique value. The pd.get_dummies() function handles this conversion and can automatically drop NaN values by setting dummy_na=False.
Understanding Dummy Variables
Dummy variables are binary (0 or 1) columns that represent categorical data. For example, a "Gender" series with values "Male" and "Female" becomes two columns: "Male" and "Female", where 1 indicates the presence of that category.
Syntax
pd.get_dummies(data, dummy_na=False)
Parameters
The key parameter for handling NaN values ?
-
dummy_na=False: Excludes NaN values from dummy variable creation -
dummy_na=True: Creates an additional column for NaN values
Example
Let's create a series with gender data including NaN values and convert it to dummy variables ?
import pandas as pd
import numpy as np
gender = ['Male', 'Female', 'Male', 'Female', 'Male', np.nan, 'Female', 'Female']
series = pd.Series(gender)
print("Original Series:")
print(series)
Original Series: 0 Male 1 Female 2 Male 3 Female 4 Male 5 NaN 6 Female 7 Female dtype: object
Converting to Dummy Variables
Now let's convert the series to dummy variables with NaN values dropped ?
import pandas as pd
import numpy as np
gender = ['Male', 'Female', 'Male', 'Female', 'Male', np.nan, 'Female', 'Female']
series = pd.Series(gender)
dummy_vars = pd.get_dummies(series, dummy_na=False)
print("Dummy Variables:")
print(dummy_vars)
Dummy Variables: Female Male 0 0 1 1 1 0 2 0 1 3 1 0 4 0 1 5 0 0 6 1 0 7 1 0
Comparison: With and Without NaN Column
Here's how the output differs when including NaN as a dummy variable ?
import pandas as pd
import numpy as np
gender = ['Male', 'Female', 'Male', np.nan, 'Female']
series = pd.Series(gender)
print("Without NaN column (dummy_na=False):")
print(pd.get_dummies(series, dummy_na=False))
print("\nWith NaN column (dummy_na=True):")
print(pd.get_dummies(series, dummy_na=True))
Without NaN column (dummy_na=False): Female Male 0 0 1 1 1 0 2 0 1 3 0 0 4 1 0 With NaN column (dummy_na=True): Female Male NaN 0 0 1 0 1 1 0 0 2 0 1 0 3 0 0 1 4 1 0 0
Key Points
| Parameter | Behavior | Use Case |
|---|---|---|
dummy_na=False |
NaN values result in all zeros | When NaN represents missing data to ignore |
dummy_na=True |
Creates separate column for NaN | When NaN is a meaningful category |
Conclusion
Use pd.get_dummies(series, dummy_na=False) to convert categorical series into binary dummy variables while automatically dropping NaN values. This is essential for preparing categorical data for machine learning algorithms.
