Home > Mobile >  Python Pandas. Loop through Excel Worksheets and clean before concatenate
Python Pandas. Loop through Excel Worksheets and clean before concatenate

Time:01-19

I have a spread-sheet with many tabs. What I am trying to achieve is to (1) clean each tab and then (2) merge them together into a new data frame. Number (1) is working fine:

import os
import pandas as pd

df = pd.read_excel('Royalties 2021.xlsx')
df = df.fillna('')
df.columns = (df.iloc[1]   ' '   df.iloc[2])
df = df[df.iloc[:,0] == 'TOTAL']

I can merge the Sheets:

import pandas as pd
import os

sheets = pd.read_excel('Royalties 2021.xlsx', sheet_name=None)
df = pd.DataFrame()
for s in sheets:
    df_main = pd.concat(sheets, ignore_index=True)
df_main

But when I try to complete (2) the entire code, nothing seems to work:

import pandas as pd
import os

sheets = pd.read_excel('Royalties 2021.xlsx', sheet_name=None)
df = pd.DataFrame()

for s in sheets:
    sheets = df.fillna('')
    sheets = (df.iloc[1]   ' '   df.iloc[2])
    sheets = df[df.iloc[:,0] == 'TOTAL']
    df_main = pd.concat(sheets, ignore_index=True)

df_main

Would anyone please help? I would appreciate that! Thanks!

CodePudding user response:

Assuming the tables on all your sheets has the same schema (same columns and same order), then you might want to use something like this:

import pandas as pd
import os

sheets = pd.read_excel('Royalties 2021.xlsx', sheet_name=None).values()
dfs = []
for s in sheets:
    s = s.fillna('')
    s.columns = (s.iloc[1]   ' '   s.iloc[2])
    s = s[s.iloc[:,0] == 'TOTAL']
    dfs.append(s)
df_main = pd.concat(dfs)

Careful, in your snippet, you are not using df that you defined and you are applying transformations on sheets (dict of dataframes) rather than s, resulting from the iteration on sheets.

  • Related