Home > database >  How can I replicate mixed dtype warning from pandas?
How can I replicate mixed dtype warning from pandas?

Time:10-04

I have read that in pandas if a column has mixed dtyped elements then reading that would lead to DtypeWarning: Columns (X,X) have mixed types. Specify dtype option on import or set low_memory=False in Pandas

So, i am trying to replicate this error but i cant seem to do it. Below is the codebit that i am using to achieve the said behavior:

dfs=pd.DataFrame({'a':['a','b','c',1.,2.,np.nan],'b':['d','e','f','g','h',np.nan]})
dfs = pd.concat([dfs for i in range(10000)],axis=0)
dfs.to_csv('test_csv.csv',index=False)
pd.read_csv('test_csv.csv')

This leads to no warnings and just reads the data as object types. Can you please help me understand where I am going wrong?

CodePudding user response:

According to the documentation for pandas.errors.DtypeWarning:

This warning is issued when dealing with larger files because the dtype checking happens per chunk read.

This is not crystal clear, but I think the reason you're not getting the warning is that you have different types within each chunk of data read, so read_csv automatically converts them to object.

When I create a dataframe with different types spread out in different chunks (i.e., long chunks of the same data type before switching to a different type), I get the warning.

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': ([1.] * 100000   ['a'] * 100000   [np.nan] * 100000),
                   'b': ([1.] * 100000   ['b'] * 100000   [np.nan] * 100000)})

df.to_csv('test.csv', index=False)
df2 = pd.read_csv('test.csv')

Output:

sys:1: DtypeWarning: Columns (0,1) have mixed types.Specify dtype option on import or set low_memory=False.
  • Related