Home > Mobile >  How to get sheet names for list of files dynamically?
How to get sheet names for list of files dynamically?

Time:04-04

I have the following list allfiles:

['Input_controls\\CTA-02 Q2.xlsx',
 'Input_controls\\CTA-14 Q2.xlsx',
 'Input_controls\\CTA-23 Q2.xlsx',
 'Input_controls\\CTA-57 Q2.xlsx',
 'Input_controls\\PVHE FMS GRC CTA Analysis Q2 2021.xlsx',
 'Input_controls\\PVHE FMS GRC SoD Analysis Q2 2021.xlsx',
 'Input_controls\\SOD-16 Q2.xlsb',
 'Input_controls\\SOD-21_Q2.xlsx',
 'Input_controls\\SOD-74 Q2.xlsx']

Now I would like a df with all files and their respective sheets So ideal output would be:

file                             sheet_name#1 sheet_name#2 sheet_name#x

'Input_controls\\CTA-02 Q2.xlsx'  foo           bar           test

The above for each input file

I've drafted the following code but it keeps on executing (never stops):

#bad
#xl = pd.ExcelFile('archvio.xlsx')
df_combined = pd.DataFrame()
for file in allfiles:
    xl = pd.ExcelFile(file)
    for sheet_name in xl.sheet_names:
        df = xl.parse(sheet_name)
        df['sheet'] = sheet_name       # this adds `sheet_name` into the column `Week`
        df_combined = df_combined.append(df)

CodePudding user response:

I'm the first to answer! I mean what you can do is at the end add a break and that's all! Easy, right? Fixed code below!

#not bad anymore
#xl = pd.ExcelFile('archvio.xlsx')
df_combined = pd.DataFrame()
for file in allfiles:
    xl = pd.ExcelFile(file)
    for sheet_name in xl.sheet_names:
        df = xl.parse(sheet_name)
        df['sheet'] = sheet_name       # this adds `sheet_name` into the column `Week`
        df_combined = df_combined.append(df)
        break

CodePudding user response:

IIUC, you can use a simple dictionary comprehension:

df = pd.concat({f: pd.Series(pd.ExcelFile(f).sheet_names).rename(lambda x: x 1)
                for f in files}, axis=1).T.add_prefix('sheet_')

Here an example on a pandas test sample:

                                sheet_1    sheet_2   sheet_3 sheet_4
times_1904.xlsx                  Sheet1        NaN       NaN     NaN
test_types.xlsx                  Sheet1        NaN       NaN     NaN
testmultiindex.xlsx  single_column_name  mi_column  mi_index    both
  • Related