I have got the following dataframe:
lst=[['01012021','A',100,'NaN'],['01012021','B',120,'NaN'],['01022021','A',140,5],['01022021','B',160,12],['01032021','A',180,20],['01032021','B',200,25]]
df1=pd.DataFrame(lst,columns=['Date','FN','AuM','NNA'])
I would like to generate a new column (AuM_2) which takes the starting value in column AuM and accumulates the entries in column NNA by the column Product. The end result should look like the below mentioned dataframe:
lst=[['01012021','A',100,'NaN','NaN'],['01012021','B',120,'NaN','NaN'],['01022021','A',140,5,105],['01022021','B',160,12,132],['01032021','A',180,20,125],['01032021','B',200,25,157]]
df2=pd.DataFrame(lst,columns=['Date','FN','AuM','NNA','AuM_2'])
Do you have any idea how I could generate column AuM_2 with groupby? Thank you for your assistance.
CodePudding user response:
Sum values from GroupBy.cumsum
and first values per groups by GroupBy.transform
:
df1['NNA'] = pd.to_numeric(df1['NNA'], errors='coerce')
df1['AuM'] = pd.to_numeric(df1['AuM'], errors='coerce')
g = df1.groupby('FN')
df1['AuM_2'] = g['NNA'].cumsum() g['AuM'].transform('first')
print (df1)
Date FN AuM NNA AuM_2
0 01012021 A 100 NaN NaN
1 01012021 B 120 NaN NaN
2 01022021 A 140 5.0 105.0
3 01022021 B 160 12.0 132.0
4 01032021 A 180 20.0 125.0
5 01032021 B 200 25.0 157.0