Home > Software engineering >  how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of val
how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of val

Time:11-08

let's say I have a data frame like this

eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123 
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212

I wanted to add a new row into each group of groupby(by=['eff_date','mdl_cd','ast_cd']) in which column value for eff_date,mdl_cd and ast_cd will remain same but for prop_cd value become Hlds and value value column become sum of value of that group e.g. for first group value of value column will be (-0.1234 0.5123 -0.7612) i.e. -0.3723

hence the output will be like this

eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Agri,Hlds,-0.3723         row added   (sum of value in that group)

2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Engy,Hlds,0.5101          row added  (sum of value in that group)

2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Industry,Hlds,-0.7634      row added (sum of value in that group)

2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-22,Comm,Metal,Hlds,-0.1277         row added (sum of value in that group)

2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Agri,Hlds,0.979           row added (sum of value in that group)

2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123 
2021-09-23,Equity,Engy,Hlds,0.0101          row added (sum of value in that group)

2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Industry,Hlds,-0.2334     row added (sum of value in that group)

2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
2021-09-23,Equity,Metal,Hlds,1.1747         row added (sum of value in that group)

how to perform this computation using pandas

CodePudding user response:

You can create a dataframe with the sum of each group by .groupby() and .sum(), set the prop_cd as Hlds by .assign().

Then, concat with the original dataframe by pd.concat() and sort the columns to put the sum rows back together with their respective groups by .sort_values(), as follows:

df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')

df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)

Result:

print(df_out)

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Agri    Hlds -0.3723
4   2021-09-22    Comm      Engy     Car  0.1212
5   2021-09-22    Comm      Engy    Fund -0.1234
6   2021-09-22    Comm      Engy     Mmt  0.5123
7   2021-09-22    Comm      Engy    Hlds  0.5101
8   2021-09-22    Comm  Industry     Car -0.7612
9   2021-09-22    Comm  Industry    Fund  0.1212
10  2021-09-22    Comm  Industry     Mmt -0.1234
11  2021-09-22    Comm  Industry    Hlds -0.7634
12  2021-09-22    Comm     Metal     Car  0.5123
13  2021-09-22    Comm     Metal    Fund -0.7612
14  2021-09-22    Comm     Metal     Mmt  0.1212
15  2021-09-22    Comm     Metal    Hlds -0.1277
16  2021-09-23  Equity      Agri     Car  0.6541
17  2021-09-23  Equity      Agri    Fund  0.5123
18  2021-09-23  Equity      Agri     Mmt -0.1874
19  2021-09-23  Equity      Agri    Hlds  0.9790
20  2021-09-23  Equity      Engy     Car  0.1212
21  2021-09-23  Equity      Engy    Fund -0.6234
22  2021-09-23  Equity      Engy     Mmt  0.5123
23  2021-09-23  Equity      Engy    Hlds  0.0101
24  2021-09-23  Equity  Industry     Car -0.1612
25  2021-09-23  Equity  Industry    Fund  0.1212
26  2021-09-23  Equity  Industry     Mmt -0.1934
27  2021-09-23  Equity  Industry    Hlds -0.2334
28  2021-09-23  Equity     Metal     Car  0.5123
29  2021-09-23  Equity     Metal    Fund  0.5412
30  2021-09-23  Equity     Metal     Mmt  0.1212
31  2021-09-23  Equity     Metal    Hlds  1.1747

Setup

df = pd.read_clipboard(',')

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Engy     Car  0.1212
4   2021-09-22    Comm      Engy    Fund -0.1234
5   2021-09-22    Comm      Engy     Mmt  0.5123
6   2021-09-22    Comm  Industry     Car -0.7612
7   2021-09-22    Comm  Industry    Fund  0.1212
8   2021-09-22    Comm  Industry     Mmt -0.1234
9   2021-09-22    Comm     Metal     Car  0.5123
10  2021-09-22    Comm     Metal    Fund -0.7612
11  2021-09-22    Comm     Metal     Mmt  0.1212
12  2021-09-23  Equity      Agri     Car  0.6541
13  2021-09-23  Equity      Agri    Fund  0.5123
14  2021-09-23  Equity      Agri     Mmt -0.1874
15  2021-09-23  Equity      Engy     Car  0.1212
16  2021-09-23  Equity      Engy    Fund -0.6234
17  2021-09-23  Equity      Engy     Mmt  0.5123
18  2021-09-23  Equity  Industry     Car -0.1612
19  2021-09-23  Equity  Industry    Fund  0.1212
20  2021-09-23  Equity  Industry     Mmt -0.1934
21  2021-09-23  Equity     Metal     Car  0.5123
22  2021-09-23  Equity     Metal    Fund  0.5412
23  2021-09-23  Equity     Metal     Mmt  0.1212

Interim result:

print(df_sum)

     eff_date  mdl_cd    ast_cd   value prop_cd
0  2021-09-22    Comm      Agri -0.3723    Hlds
1  2021-09-22    Comm      Engy  0.5101    Hlds
2  2021-09-22    Comm  Industry -0.7634    Hlds
3  2021-09-22    Comm     Metal -0.1277    Hlds
4  2021-09-23  Equity      Agri  0.9790    Hlds
5  2021-09-23  Equity      Engy  0.0101    Hlds
6  2021-09-23  Equity  Industry -0.2334    Hlds
7  2021-09-23  Equity     Metal  1.1747    Hlds
  • Related