Extracting columns from an excel file with python ignoring customised cell settings-CodePudding

i a trying to bring the Data from many Excel files together. There are columns for Company name, city, and costs.

The problem is: on a lot of the hundreds of excel sheets, the costs cells are customized. if i read them with python they are shown as datetime.datetime. i have no idea why python change a customized cell to a timestamp and i have a lot less ideas how to solve this. i already tryed a lot but it mostly says cant convert daytime.daytime to int oder float.... any ideas?

pfad3 = r"C://Users//Schwamm//Desktop//Submission//test1//"
datenlage = glob.glob(pfad3   "*.xlsx")
i = 0
concat_all_files = pd.DataFrame()
for file in datenlage:
    i = i   1
    filename = datenlage
    #DF = pd.read_excel(file, sheet_name=None, index_col=0, header=0,usecols=[0,1,2,3,4])
    DF = pd.read_excel(file, sheet_name=None, skiprows=None, nrows=None, usecols=None, header=1, index_col=None)
    concat_all_files_single = pd.concat(DF,sort=False)
    concat_all_files_single['Dateiname']=(file.split("\\")[-1])
    concat_all_files_single['Laufendenummer'] = (i)
    concat_all_files = concat_all_files.append(concat_all_files_single)
    #concat_all_files = concat_all_files.append(datenlage)
print(concat_all_files)

Image: https://imagizer.imageshack.com/img924/8097/rzK9ms.jpg

CodePudding user response：

even if i try to make all an object.

DF = pd.read_excel(file, dtype=object, sheet_name=None, skiprows=None, nrows=None, usecols=None, header=1, index_col=None)

still same issue...

CodePudding user response：

Note Question needs improvment and some more debugging details

Per default pandas tries to sniff the types of column values while reading in, but you can force it to handle the costs column as str:

pd.read_excel(file, dtype={'costs':'object'})

You also can force pandas to read the whole data as str:

pd.read_excel(file, dtype=object)

EDIT

As DS_London mentioned correctly - To ensure the doubles are imported:

pd.read_excel(file, dtype = {'costs':np.float64})