Home > front end >  looping over multiple excel sheets to create concat dataframe
looping over multiple excel sheets to create concat dataframe

Time:05-20

I have an excel sheet that has multiple sheets each corresponding to 1 year of monthly data from Jan-Dec for a specific index (A, B, C, D, E, F, G)

For example:

Sheet_2022
    Jan-2022 Feb-2022 Mar-2022 Apr-2022 May-2022 Jun-2022 Jul-2022 Aug-2022 Sep-2022 Oct-2022 Nov-2022 Dec-2022
A       1       6       9           5       8       5       8       4           9       4       0       3
B       6       5       9           5       8       5       8       4           9       4       0       3
C       32      6       9           5       8       5       8       4           9       4       0       3
D       1       7       9           5       8       5       8       4           9       4       0       3
E       4       8       9           5       8       5       8       4           9       4       0       3
F       8       9       9           5       8       5       8       4           9       4       0       3
Sheet_2023
    Jan-2023 Feb-2023 Mar-2023 Apr-2023 May-2023 Jun-2023 Jul-2023 Aug-2023 Sep-2023 Oct-2023 Nov-2023 Dec-2023
A       2       6       5           5       8       5       8       4           9       4       0       3
B       5       5       34          5       8       5       8       4           9       4       0       3
C       32      6       3           5       8       5       8       4           9       4       0       3
D       8       7       2           5       8       5       8       4           9       4       0       3
E       9       8       5           5       8       5       8       4           9       4       0       3
F       0       9       8           5       8       5       8       4           9       4       0       3
Sheet_2024
    Jan-2024 Feb-2024 Mar-2024 Apr-2024 May-2024 Jun-2024 Jul-2024 Aug-2024 Sep-2024 Oct-2024 Nov-2024 Dec-2024
A       1       6       2           5       8       5       8       4           9       4       0       3
B       6       5       3           5       8       5       8       4           9       4       0       3
C       32      6       4           5       8       5       8       4           9       4       0       3
D       1       7       5           5       8       5       8       4           9       4       0       3
E       4       8       6           5       8       5       8       4           9       4       0       3
G       8       9       7           5       8       5       8       4           9       4       0       3

I am using the below code to read all sheets and save it in same dataframe:

for i in result:
    df = pd.read_excel(filename_Value, sheet_name = i)
    df_Value.append(df)
df_Value = pd.concat(df_Value)

But this is giving data is somewhat different format:

df_Value
    Jan-2022 Feb-2022 Mar-2022  Jan-2023 Feb-2023 Mar-2023  Jan-2024 Feb-2024 Mar-2024
A     1         6       9                       
B     6         5       9                       
C     32        6       9                       
D     1         7       9                       
E     4         8       9                       
F     8         9       9                       
A                                   2       6       5                           
B                                   5       5       34                          
C                                   32      6       3                           
D                                   8       7       2                           
E                                   9       8       5                           
F                                   0       9       8                           
A                                                               1       6       2           
B                                                               6       5       3           
C                                                               32      6       4           
D                                                               1       7       5           
E                                                               4       8       6           
G                                                               8       9       7           

I am looking to create a single dataframe where the values will be like:

df_Value
    Jan-2022 Feb-2022 Mar-2022  Jan-2023 Feb-2023 Mar-2023  Jan-2024 Feb-2024 Mar-2024
A     1         6       9           2       6       5           1       6       2                       
B     6         5       9           5       5       34          6       5       3                       
C     32        6       9           32      6       3           32      6       4                       
D     1         7       9           8       7       2           1       7       5                       
E     4         8       9           9       8       5           4       8       6                       
F     8         9       9           0       9       8                                   
G                                                               8       9       7                       

Is there a easier way to do this?

CodePudding user response:

Pandas concat or merge will do exactly what you want.

concat will be more in line with what you are asking since merge only works with two dataframes at a time.

from io import StringIO

import pandas as pd

t22_1 = StringIO("""\
i,Jan22,Feb22,Mar22
A,1,4,2
B,6,2,3
C,7,3,0
""")

t22_2 = StringIO("""\
i,Apr22,May22,Jun22
A,0,5,7
B,6,3,8
D,1,7,9
""")

df22_1 = pd.read_csv(t22_1, index_col="i")

df22_2 = pd.read_csv(t22_2, index_col="i")

pd.concat([df22_1, df22_2], axis=1)
# or
pd.merge(df22_1, df22_2, left_index=True, right_index=True, how="outer")

--
   Jan22  Feb22  Mar22  Apr22  May22  Jun22
i                                          
A    1.0    4.0    2.0    0.0    5.0    7.0
B    6.0    2.0    3.0    6.0    3.0    8.0
C    7.0    3.0    0.0    NaN    NaN    NaN
D    NaN    NaN    NaN    1.0    7.0    9.0

NB. I use StringIO to avoid writing file, keep on loading data as you do now.

CodePudding user response:

There's a way you can try, but I can't guarantee it is a better solution.

data = {'jan21':[1,2,3,4],'feb21':[2,3,4,5]}
data1 = {'jan22':[1,2,3,4],'feb22':[2,3,4,5]}
df = pd.DataFrame(data)
df1 = pd.DataFrame(data1)

df[list(df1.columns)]=df1
print(df)

   jan21  feb21  jan22  feb22
0      1      2      1      2
1      2      3      2      3
2      3      4      3      4
3      4      5      4      5

Since you have created all yours dataframes from the same sheet, I can think of using list(df.columns) to merge your dataframes together.

dataframe = pd.DataFrame()
for i in result:
    df = pd.read_excel(filename_Value, sheet_name = i)
    dataframe[list(df.columns)] = df
  • Related