I'm trying to append multiple Excel files with same columns into one. If I use this code x.append(y, ignore_index = True)
it doesn't work. At the end of the for loop it returns only table of x. However if I try to run x.append(y, ignore_index = True)
in a single code block it works fine to append y which is still in the memory after for loop. I am using Juptyer Notebook.
# import required module
import os as os
import pandas as pd
# assign directory
# directory = 'C:\\Users\\Tomas\\Documents\\Python Scripts\\csv\\TimeLogs'
directory = 'C:\\Users\\Tomas\\Documents\\Python Scripts\\csv\\tmp'
# iterate over files in
# that directory
for idx,filename in enumerate(os.listdir(directory)):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
print(f)
print(idx)
if idx == 0:
x = pd.read_excel(f,engine="openpyxl")
else:
y = pd.read_excel(f,engine="openpyxl")
x.append(y, ignore_index = True)
CodePudding user response:
You can create a list of dataframes and then concatenate them using pandas' concat
method.
# import required module
import os as os
import pandas as pd
# assign directory
# directory = 'C:\\Users\\Tomas\\Documents\\Python Scripts\\csv\\TimeLogs'
directory = 'C:\\Users\\Tomas\\Documents\\Python Scripts\\csv\\tmp'
# iterate over files in
# that directory
list_of_dataframes=[]
for idx,filename in enumerate(os.listdir(directory)):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
print(f)
list_of_dataframes.append(pd.read_excel(f,engine="openpyxl"))
merged_df=pd.concat(list_of_dataframes)
This way, you do not have to check whether the index idx
is equal to 0.