Home > Blockchain >  How can I simplify my pandas script using a loop?
How can I simplify my pandas script using a loop?

Time:08-12

I have the following code:

import pandas as pd

df22=pd.read_excel(r"C:\Users\H\Desktop\Files\Table22.xlsx")

#Select the sheets that are to be transformed
df3=pd.read_excel(r"C:\Users\H\Desktop\Files\Table3.xlsx")
df4=pd.read_excel(r"C:\Users\H\Desktop\Files\Table4.xlsx")
df5=pd.read_excel(r"C:\Users\H\Desktop\Files\Table5.xlsx")
df6=pd.read_excel(r"C:\Users\H\Desktop\Files\Table6.xlsx")
df7=pd.read_excel(r"C:\Users\H\Desktop\Files\Table7.xlsx")
df8=pd.read_excel(r"C:\Users\H\Desktop\Files\Table8.xlsx")
df9=pd.read_excel(r"C:\Users\H\Desktop\Files\Table9.xlsx")
df10=pd.read_excel(r"C:\Users\H\Desktop\Files\Table10.xlsx")
df11=pd.read_excel(r"C:\Users\H\Desktop\Files\Table11.xlsx")
df12=pd.read_excel(r"C:\Users\H\Desktop\Files\Table12.xlsx")
df13=pd.read_excel(r"C:\Users\H\Desktop\Files\Table13.xlsx")
df14=pd.read_excel(r"C:\Users\H\Desktop\Files\Table14.xlsx")
df15=pd.read_excel(r"C:\Users\H\Desktop\Files\Table15.xlsx")
df16=pd.read_excel(r"C:\Users\H\Desktop\Files\Table16.xlsx")
df17=pd.read_excel(r"C:\Users\H\Desktop\Files\Table17.xlsx")
df18=pd.read_excel(r"C:\Users\H\Desktop\Files\Table18.xlsx")
df19=pd.read_excel(r"C:\Users\H\Desktop\Files\Table19.xlsx")
df20=pd.read_excel(r"C:\Users\H\Desktop\Files\Table20.xlsx")
df21=pd.read_excel(r"C:\Users\H\Desktop\Files\Table21.xlsx")

df=pd.concat([df22,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,df15,df16,df17,df18,df19,df20,df21], join='inner')

df.to_excel(r'C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index = False) 

It appends Week22.xlsx with all weeks between 3 to 21. I'm trying to find out if anyone knows how this script can be improved. I was trying to use loops but I just couldn't get it to work.

CodePudding user response:

Use list comprehension:

df22=pd.read_excel(r"C:\Users\H\Desktop\Files\Table22.xlsx")
dfs = [pd.read_excel(rf"C:\Users\H\Desktop\Files\Table{x}.xlsx") for x in range(3, 22)]
df=pd.concat([df22]   dfs, join='inner')

df.to_excel(r'C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index = False)

Or create list of all DataFrames and then append last dataframe to list like first:

dfs = [pd.read_excel(rf"C:\Users\H\Desktop\Files\Table{x}.xlsx") for x in range(3, 23)]
df=pd.concat(dfs[-1:]   dfs[:-1], join='inner')
#another idea is swap order - 22, 21, 20 ... 3
#df=pd.concat(dfs[::-1], join='inner')

df.to_excel(r'C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index = False)

CodePudding user response:

You could use a for-loop to read files from Table3 to Table21, and concatenate each dataframe with Table22, for example

import pandas as pd

df22 = pd.read_excel('C:\Users\H\Desktop\Files\Table22.xlsx')
for i in range(3, 22):
    df22 = pd.concat([df22, pd.read_excel('C:\Users\H\Desktop\Files\Table'   str(i)   '.xlsx')])

df22.to_excel('C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index=False) 

Note that the integer i has to be converted to string str(i) in the file path.

  • Related