To prevent automatic type change in Pandas-CodePudding

I have a excel (.xslx) file with 4 columns: pmid (int) gene (string) disease (string) label (string)

I attempt to load this directly into python with pandas.read_excel


        df = pd.read_excel(path, parse_dates=False)

capture from excel

capture from pandas using my ide debugger

As shown above, pandas tries to be smart, automatically converting some of gene fields such as 3.Oct, 4.Oct to a datetime type. The issue is that 3.Oct or 4.Oct is a abbreviation of Gene type and totally different meaning. so I don't want pandas to do so. How can I prevent pandas from converting types automatically?

CodePudding user response：

pd.read_excel has a dtype argument you can use to specify data types explicitly.

CodePudding user response：

Force dtype=str to prevent Pandas try to transform your dataframe

df = pd.read_excel(path, dtype=str)

Or use converters={'colX': str, ...} to map the dtype for each columns.