I have an excel sheet that has multiple sheets each corresponding to 1 year of monthly data from Jan-Dec for a specific index (A, B, C, D, E, F, G)
For example:
Sheet_2022
Jan-2022 Feb-2022 Mar-2022 Apr-2022 May-2022 Jun-2022 Jul-2022 Aug-2022 Sep-2022 Oct-2022 Nov-2022 Dec-2022
A 1 6 9 5 8 5 8 4 9 4 0 3
B 6 5 9 5 8 5 8 4 9 4 0 3
C 32 6 9 5 8 5 8 4 9 4 0 3
D 1 7 9 5 8 5 8 4 9 4 0 3
E 4 8 9 5 8 5 8 4 9 4 0 3
F 8 9 9 5 8 5 8 4 9 4 0 3
Sheet_2023
Jan-2023 Feb-2023 Mar-2023 Apr-2023 May-2023 Jun-2023 Jul-2023 Aug-2023 Sep-2023 Oct-2023 Nov-2023 Dec-2023
A 2 6 5 5 8 5 8 4 9 4 0 3
B 5 5 34 5 8 5 8 4 9 4 0 3
C 32 6 3 5 8 5 8 4 9 4 0 3
D 8 7 2 5 8 5 8 4 9 4 0 3
E 9 8 5 5 8 5 8 4 9 4 0 3
F 0 9 8 5 8 5 8 4 9 4 0 3
Sheet_2024
Jan-2024 Feb-2024 Mar-2024 Apr-2024 May-2024 Jun-2024 Jul-2024 Aug-2024 Sep-2024 Oct-2024 Nov-2024 Dec-2024
A 1 6 2 5 8 5 8 4 9 4 0 3
B 6 5 3 5 8 5 8 4 9 4 0 3
C 32 6 4 5 8 5 8 4 9 4 0 3
D 1 7 5 5 8 5 8 4 9 4 0 3
E 4 8 6 5 8 5 8 4 9 4 0 3
G 8 9 7 5 8 5 8 4 9 4 0 3
I am using the below code to read all sheets and save it in same dataframe:
for i in result:
df = pd.read_excel(filename_Value, sheet_name = i)
df_Value.append(df)
df_Value = pd.concat(df_Value)
But this is giving data is somewhat different format:
df_Value
Jan-2022 Feb-2022 Mar-2022 Jan-2023 Feb-2023 Mar-2023 Jan-2024 Feb-2024 Mar-2024
A 1 6 9
B 6 5 9
C 32 6 9
D 1 7 9
E 4 8 9
F 8 9 9
A 2 6 5
B 5 5 34
C 32 6 3
D 8 7 2
E 9 8 5
F 0 9 8
A 1 6 2
B 6 5 3
C 32 6 4
D 1 7 5
E 4 8 6
G 8 9 7
I am looking to create a single dataframe where the values will be like:
df_Value
Jan-2022 Feb-2022 Mar-2022 Jan-2023 Feb-2023 Mar-2023 Jan-2024 Feb-2024 Mar-2024
A 1 6 9 2 6 5 1 6 2
B 6 5 9 5 5 34 6 5 3
C 32 6 9 32 6 3 32 6 4
D 1 7 9 8 7 2 1 7 5
E 4 8 9 9 8 5 4 8 6
F 8 9 9 0 9 8
G 8 9 7
Is there a easier way to do this?
CodePudding user response:
Pandas concat
or merge
will do exactly what you want.
concat
will be more in line with what you are asking since merge only works with two dataframes at a time.
from io import StringIO
import pandas as pd
t22_1 = StringIO("""\
i,Jan22,Feb22,Mar22
A,1,4,2
B,6,2,3
C,7,3,0
""")
t22_2 = StringIO("""\
i,Apr22,May22,Jun22
A,0,5,7
B,6,3,8
D,1,7,9
""")
df22_1 = pd.read_csv(t22_1, index_col="i")
df22_2 = pd.read_csv(t22_2, index_col="i")
pd.concat([df22_1, df22_2], axis=1)
# or
pd.merge(df22_1, df22_2, left_index=True, right_index=True, how="outer")
--
Jan22 Feb22 Mar22 Apr22 May22 Jun22
i
A 1.0 4.0 2.0 0.0 5.0 7.0
B 6.0 2.0 3.0 6.0 3.0 8.0
C 7.0 3.0 0.0 NaN NaN NaN
D NaN NaN NaN 1.0 7.0 9.0
NB. I use StringIO to avoid writing file, keep on loading data as you do now.
CodePudding user response:
There's a way you can try, but I can't guarantee it is a better solution.
data = {'jan21':[1,2,3,4],'feb21':[2,3,4,5]}
data1 = {'jan22':[1,2,3,4],'feb22':[2,3,4,5]}
df = pd.DataFrame(data)
df1 = pd.DataFrame(data1)
df[list(df1.columns)]=df1
print(df)
jan21 feb21 jan22 feb22
0 1 2 1 2
1 2 3 2 3
2 3 4 3 4
3 4 5 4 5
Since you have created all yours dataframes from the same sheet, I can think of using list(df.columns)
to merge your dataframes together.
dataframe = pd.DataFrame()
for i in result:
df = pd.read_excel(filename_Value, sheet_name = i)
dataframe[list(df.columns)] = df