I am trying to concatenate two excel files with the same column names together, but there seems to be a problem as there are new empty columns/spaces being added to my new excel file, and i don't know why.
I used pd.concat()
function which was supposed to concat the two files into one single sheet and make a new file, but when it adds the table in the second file to the first file, new columns/spaces are added to the new merged file.
file_list = glob.glob(path "/*.xlsx")
dfs = pd.DataFrame()
dfs = [pd.read_excel(p,) for p in file_list]
print(dfs[0].shape)
res = pd.concat(dfs)
That is a snippet of my code
I also added a picture of what the result i am getting now looks like
CodePudding user response:
Concat respects the column names, so is not like a plain vector concatenate, try to check if the column names are the same among all your source files. If no, you can normalize them, rename them or move to a vector base format like numpy arrays.