I have a spread-sheet with many tabs. What I am trying to achieve is to (1) clean each tab and then (2) merge them together into a new data frame. Number (1) is working fine:
import os
import pandas as pd
df = pd.read_excel('Royalties 2021.xlsx')
df = df.fillna('')
df.columns = (df.iloc[1] ' ' df.iloc[2])
df = df[df.iloc[:,0] == 'TOTAL']
I can merge the Sheets:
import pandas as pd
import os
sheets = pd.read_excel('Royalties 2021.xlsx', sheet_name=None)
df = pd.DataFrame()
for s in sheets:
df_main = pd.concat(sheets, ignore_index=True)
df_main
But when I try to complete (2) the entire code, nothing seems to work:
import pandas as pd
import os
sheets = pd.read_excel('Royalties 2021.xlsx', sheet_name=None)
df = pd.DataFrame()
for s in sheets:
sheets = df.fillna('')
sheets = (df.iloc[1] ' ' df.iloc[2])
sheets = df[df.iloc[:,0] == 'TOTAL']
df_main = pd.concat(sheets, ignore_index=True)
df_main
Would anyone please help? I would appreciate that! Thanks!
CodePudding user response:
Assuming the tables on all your sheets has the same schema (same columns and same order), then you might want to use something like this:
import pandas as pd
import os
sheets = pd.read_excel('Royalties 2021.xlsx', sheet_name=None).values()
dfs = []
for s in sheets:
s = s.fillna('')
s.columns = (s.iloc[1] ' ' s.iloc[2])
s = s[s.iloc[:,0] == 'TOTAL']
dfs.append(s)
df_main = pd.concat(dfs)
Careful, in your snippet, you are not using df
that you defined and you are applying transformations on sheets
(dict of dataframes) rather than s
, resulting from the iteration on sheets
.