I have a dataframe
dfio
Name Age Zip
Good 22 03764
Bad 22 98765
The df is read from an xlsx file using
dfio = pd.read_excel(r'//i-data-share/jobs/Member Lists/x/Output.xlsx', dtype=str,engine='openpyxl')
The zips within this df are in zip5 format. but when i convert my df to csv using
folder_to_export_path = "//i-data-share/jobs/Member Lists"
dfio.to_csv(folder_to_export_path 'Outputest.csv')
This runs successfully, but when i review the data both in the csv and as a notepad file the zip code for row 1 shows as '3764' instead of '03764'. How do I keep the format as zip 5 and mantain the leading 0's on conversion to csv?
CodePudding user response:
When you read the Excel file, pd.read_excel
detects that the Zip
column contains integers, and therefore it sets the dtype of that column in the resulting dataframe to int64
. Since integers in Python don't have leading zeros...those get dropped.
To avoid this, specify dtype='object'
, instead of dtype=str
(because strings are object
in Python, not str
), in your pd.read_excel
call to have Pandas not interpret any dtypes:
dfio = pd.read_excel(r'//cfpet-data-share/jobs/Vet Member Lists/Results/Output.xlsx', dtype='object', engine='openpyxl')
CodePudding user response:
I overcame a similar problem by declaring a dict containing data types of certain columns, then including that dict in the command to read the file.
converter = dict()
converter['Zip'] = str
dfio = pd.read_excel(r'//i-data-share/jobs/Member Lists/x/Output.xlsx', converters=converter,engine='openpyxl')