I have got the following dataframe:
lst=[['2021','2021-11-01','A','AA',1.02],['2021','2021-11-01','B','BB',1.1],['2021','2021-12-01','A','AA',1.3],['2021','2021-12-01','B','BB',1.25],['2022','2022-01-01','A','AA',1.25],['2022','2022-01-01','B','BB',1.4]]
df2=pd.DataFrame(lst,columns=['YEAR','Month','P1','P2','factor'])
I would like to multiply the column factor month, P1 and P2 for each year. Below is what I would like to get.
lst=[['2021','2021-11-01','A','AA',1.02,1.02],['2021','2021-11-01','B','BB',1.1,1.1],['2021','2021-12-01','A','AA',1.3,1.326],['2021','2021-12-01','B','BB',1.25,1.375],['2022','2022-01-01','A','AA',1.25,1.25],['2022','2022-01-01','B','BB',1.4,1.4]]
df2=pd.DataFrame(lst,columns=['YEAR','Month','P1','P2','factor','cumfactor'])
I tried the function groupby with cumprod, but it did not work
Thank you for your help
CodePudding user response:
Use groupby
on ['YEAR', 'P1', 'P2']
and cumprod
:
df2['cumfactor'] = df2.groupby(['YEAR', 'P1', 'P2'])['factor'].cumprod()
NB. ensure first that the dataframe is sorted by YEAR/Month: df2 = df2.sort_values(by=['YEAR', 'Month'])
output:
YEAR Month P1 P2 factor cumfactor
0 2021 2021-11-01 A AA 1.02 1.020
1 2021 2021-11-01 B BB 1.10 1.100
2 2021 2021-12-01 A AA 1.30 1.326
3 2021 2021-12-01 B BB 1.25 1.375
4 2022 2022-01-01 A AA 1.25 1.250
5 2022 2022-01-01 B BB 1.40 1.400
CodePudding user response:
Use GroupBy.cumprod
by columns ['YEAR', 'P1', 'P2']
and processing column factor
:
#if necessary sorting per datetime column
#df2['Month'] = pd.to_datetime(df2['Month'])
#df2 = df2.sort_values(by=['YEAR', 'Month'], ignore_index=True)
df2['cumfactor'] = df2.groupby(['YEAR', 'P1', 'P2'])['factor'].cumprod()
print (df2)
YEAR Month P1 P2 factor cumfactor
0 2021 2021-11-01 A AA 1.02 1.020
1 2021 2021-11-01 B BB 1.10 1.100
2 2021 2021-12-01 A AA 1.30 1.326
3 2021 2021-12-01 B BB 1.25 1.375
4 2022 2022-01-01 A AA 1.25 1.250
5 2022 2022-01-01 B BB 1.40 1.400