I have read that in pandas if a column has mixed dtyped elements then reading that would lead to DtypeWarning: Columns (X,X) have mixed types. Specify dtype option on import or set low_memory=False in Pandas
So, i am trying to replicate this error but i cant seem to do it. Below is the codebit that i am using to achieve the said behavior:
dfs=pd.DataFrame({'a':['a','b','c',1.,2.,np.nan],'b':['d','e','f','g','h',np.nan]})
dfs = pd.concat([dfs for i in range(10000)],axis=0)
dfs.to_csv('test_csv.csv',index=False)
pd.read_csv('test_csv.csv')
This leads to no warnings and just reads the data as object types. Can you please help me understand where I am going wrong?
CodePudding user response:
According to the documentation for pandas.errors.DtypeWarning
:
This warning is issued when dealing with larger files because the dtype checking happens per chunk read.
This is not crystal clear, but I think the reason you're not getting the warning is that you have different types within each chunk of data read, so read_csv
automatically converts them to object
.
When I create a dataframe with different types spread out in different chunks (i.e., long chunks of the same data type before switching to a different type), I get the warning.
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': ([1.] * 100000 ['a'] * 100000 [np.nan] * 100000),
'b': ([1.] * 100000 ['b'] * 100000 [np.nan] * 100000)})
df.to_csv('test.csv', index=False)
df2 = pd.read_csv('test.csv')
Output:
sys:1: DtypeWarning: Columns (0,1) have mixed types.Specify dtype option on import or set low_memory=False.