I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame. The structuring of the tabs in each of the excel files is the same. Any help would be appreciated!!
folder = "specified_path"
files = os.listdir(folder)
sheet_contents = {}
for file in files:
data = pd.ExcelFile(folder file)
file_data = {}
for sheet in data.sheet_names:
file_data[sheet] = data.parse(sheet)
sheet_contents[file[:-5]] = file_data
CodePudding user response:
One of the ways to create a dataframe for each excelfile (stored in a specific folder and that holds multiple sheets) is by using pandas.read_excel
and pandas.concat
combined. By passing the parameter sheet_name=None
to pandas.read_excel
, we can read in all the sheets in the excelfile at one time.
Try this :
import os
import pandas as pd
folder = 'specified_path'
excel_files = [file for file in os.listdir(folder)]
list_of_dfs = []
for file in excel_files :
df = pd.concat(pd.read_excel(folder "\\" file, sheet_name=None), ignore_index=True)
df['excelfile_name'] = file.split('.')[0]
list_of_dfs.append(df)
To access to one of the dataframes created, you can use its index (e.g, list_of_dfs[0]
) :
print(type(list_of_dfs[0]))
<class 'pandas.core.frame.DataFrame'>