Home > OS >  Extracting multiple excel files as Pandas data frame
Extracting multiple excel files as Pandas data frame

Time:09-09

I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame. The structuring of the tabs in each of the excel files is the same. Any help would be appreciated!!

folder = "specified_path"
files = os.listdir(folder)
sheet_contents = {}

for file in files:
    data = pd.ExcelFile(folder file)
    file_data = {}

    for sheet in data.sheet_names:
        file_data[sheet] = data.parse(sheet)

    sheet_contents[file[:-5]] = file_data

CodePudding user response:

One of the ways to create a dataframe for each excelfile (stored in a specific folder and that holds multiple sheets) is by using pandas.read_excel and pandas.concat combined. By passing the parameter sheet_name=None to pandas.read_excel, we can read in all the sheets in the excelfile at one time.

Try this :

import os
import pandas as pd

folder = 'specified_path'

excel_files = [file for file in os.listdir(folder)]

list_of_dfs = []
for file in excel_files :
    df = pd.concat(pd.read_excel(folder   "\\"   file, sheet_name=None), ignore_index=True)
    df['excelfile_name'] = file.split('.')[0]
    list_of_dfs.append(df)

To access to one of the dataframes created, you can use its index (e.g, list_of_dfs[0]) :

print(type(list_of_dfs[0]))
<class 'pandas.core.frame.DataFrame'>
  • Related