How do StringDtype objects differ from object dtype in Python Pandas?


Pandas can not only include text data as an object, it also includes any other data that pandas don’t understand. This means, if you say when a column is an Object dtype, and it doesn’t mean all the values in that column will be a string or text data. In fact, they may be numbers, or a mixture of string, integers, and floats dtype. So with this incompatibility, we can not do any string operations on that column directly.

Due to this problem, string dtype is introduced from the pandas 1.0 version, but we need to define it explicitly.

See some examples to understand how StringDtype and object dtype differ.

Example

list_ = ['python',90, 'string',2] # assign a list
ds = pd.Series(list_) # create a Series
print(ds) # print series
print()
print(type(ds[1])) # display type of 2nd element from dataSeries

Explanation

Above code, created a pandas Series with the list of 3 elements, those elements have strings as well as integers. and in the last line, we try to get the data type of the 2nd element.

Output

0   python
1       90
2   string
3        2
dtype: object

<class 'int'>

We can clearly see that the dtype of ds Series is an object, but if you try to get the type of a 2nd element it returns the output as an integer, not an object or a string. So it concludes, the dtype object doesn’t store only text data, it is a mixture of all data.

Example

Here define pd.StringDtype() explicitly to the dtype parameter of the pandas series method.

list_ = ['python',90, 'string']
ds = pd.Series(list_, dtype=pd.StringDtype())
print(ds)
print()
print(type(ds[1]))

Explanation

In this example we changed the default dtype of the series by assigning dtype parameter value as string. And again we try to display the type of 2nd element from series ds.

Output

0   python
1       90
2   string
dtype: string

<class 'str'>

The output dtype of series ds is a string and also the type of 2nd element of that ds is a string. So we can understand that the dtype StringDtype will change the type of all data.

By defining StringDtype to textual data that won’t create any difficulties to perform string operations. That is the reason it is recommended to use StringDtype to store all textual data.

Updated on: 18-Nov-2021

555 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements