i a trying to bring the Data from many Excel files together. There are columns for Company name, city, and costs.
The problem is: on a lot of the hundreds of excel sheets, the costs cells are customized. if i read them with python they are shown as datetime.datetime. i have no idea why python change a customized cell to a timestamp and i have a lot less ideas how to solve this. i already tryed a lot but it mostly says cant convert daytime.daytime to int oder float.... any ideas?
pfad3 = r"C://Users//Schwamm//Desktop//Submission//test1//"
datenlage = glob.glob(pfad3 "*.xlsx")
i = 0
concat_all_files = pd.DataFrame()
for file in datenlage:
i = i 1
filename = datenlage
#DF = pd.read_excel(file, sheet_name=None, index_col=0, header=0,usecols=[0,1,2,3,4])
DF = pd.read_excel(file, sheet_name=None, skiprows=None, nrows=None, usecols=None, header=1, index_col=None)
concat_all_files_single = pd.concat(DF,sort=False)
concat_all_files_single['Dateiname']=(file.split("\\")[-1])
concat_all_files_single['Laufendenummer'] = (i)
concat_all_files = concat_all_files.append(concat_all_files_single)
#concat_all_files = concat_all_files.append(datenlage)
print(concat_all_files)
Image: https://imagizer.imageshack.com/img924/8097/rzK9ms.jpg
CodePudding user response:
even if i try to make all an object.
DF = pd.read_excel(file, dtype=object, sheet_name=None, skiprows=None, nrows=None, usecols=None, header=1, index_col=None)
still same issue...
CodePudding user response:
Note Question needs improvment and some more debugging details
Per default pandas
tries to sniff the types of column values while reading in, but you can force it to handle the costs
column as str
:
pd.read_excel(file, dtype={'costs':'object'})
You also can force pandas
to read the whole data as str
:
pd.read_excel(file, dtype=object)
EDIT
As DS_London mentioned correctly - To ensure the doubles are imported:
pd.read_excel(file, dtype = {'costs':np.float64})