I have an excel file with over one hundred sheets. Each sheet has the same columns with information. I have concatenated all the sheets in one single file.
Script
single_file = pd.concat(pd.read_excel('multiple_sheets.xlsx',sheet_name=None),ignore_index=True)
single_file.to_csv('single_file.csv')
The script works fine. The thing is I would like to add a column that identifies each row of information with the name of its original sheet. Is that possible?
Thanks in advance, Fernando
CodePudding user response:
You'll want to save the pd.concat
for the very end of the script.
You've read in the file with sheetname = None
which creates a dictionary of sheet names as the key, and the dataframe as the value. Loop over each sheet and add your label column first, then concatenate.
CodePudding user response:
single_file = pd.read_excel('multiple_sheets.xlsx',sheet_name=None)
single_file = pd.concat([sheet.assign(identifier=i) for i,sheet in single_file.items()])
Here we are using the idea that all sheets can be accessed by iterating (using items()
) over the dictionary of dataframes.