I have a folder with an increasing number of excel files containing daily data. I want to create a dataframe that expands as the number of files increase. The new data must stack under the existing data, to extend the time-series and then save the dataframe to a new excel output file. I have this code so far. Some parts work and I do get an output file, however it is stacked in random order and not in the order from the folder/directory.
# loop over excel files in cwd
all_data = pd.DataFrame() # create empty list
list = []
for f in glob.glob("/Users/xxxx/xxxx/xxxx/Data/*.xlsx"):
df = pd.read_excel(f, sheet_name= 'data')
list.append(df)
all_data = pd.concat(list)
# save the data
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer, 'sheet1')
writer.save()
Thank you
CodePudding user response:
all_data = pd.DataFrame() # create empty list
list = []
for f in glob.glob("/Users/xxxx/xxxx/xxxx/Data/*.xlsx"):
df = pd.read_excel(f, sheet_name= 'data')
list.append(df)
all_data = pd.concat(list)
all_data.to_excel("/Users/xxxx/xxxx/xxxx/Data/all_data.xlsx")
CodePudding user response:
Assuming that you have Excel 365 and that all your files in the folder have the same structure, you can avoid vba altogether an instead do a Power Query specifying the folder as the datasource. The result will be a table that you can refresh every time you add a file to the folder.