Home > other >  To prevent automatic type change in Pandas
To prevent automatic type change in Pandas

Time:10-17

I have a excel (.xslx) file with 4 columns: pmid (int) gene (string) disease (string) label (string)

I attempt to load this directly into python with pandas.read_excel


        df = pd.read_excel(path, parse_dates=False)

capture from excel

excel snippet

capture from pandas using my ide debugger pandas

As shown above, pandas tries to be smart, automatically converting some of gene fields such as 3.Oct, 4.Oct to a datetime type. The issue is that 3.Oct or 4.Oct is a abbreviation of Gene type and totally different meaning. so I don't want pandas to do so. How can I prevent pandas from converting types automatically?

CodePudding user response:

pd.read_excel has a dtype argument you can use to specify data types explicitly.

CodePudding user response:

Force dtype=str to prevent Pandas try to transform your dataframe

df = pd.read_excel(path, dtype=str)

Or use converters={'colX': str, ...} to map the dtype for each columns.

  • Related