let's say I have a data frame like this
eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
I wanted to add a new row into each group of groupby(by=['eff_date','mdl_cd','ast_cd'])
in which column value for eff_date,mdl_cd
and ast_cd
will remain same
but for prop_cd
value become Hlds
and value value column become sum of value of that group e.g. for first group value of value column will be (-0.1234 0.5123 -0.7612) i.e. -0.3723
hence the output will be like this
eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Agri,Hlds,-0.3723 row added (sum of value in that group)
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Engy,Hlds,0.5101 row added (sum of value in that group)
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Industry,Hlds,-0.7634 row added (sum of value in that group)
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-22,Comm,Metal,Hlds,-0.1277 row added (sum of value in that group)
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Agri,Hlds,0.979 row added (sum of value in that group)
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123
2021-09-23,Equity,Engy,Hlds,0.0101 row added (sum of value in that group)
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Industry,Hlds,-0.2334 row added (sum of value in that group)
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
2021-09-23,Equity,Metal,Hlds,1.1747 row added (sum of value in that group)
how to perform this computation using pandas
CodePudding user response:
You can create a dataframe with the sum of each group by .groupby()
and .sum()
, set the prop_cd
as Hlds
by .assign()
.
Then, concat with the original dataframe by pd.concat()
and sort the columns to put the sum rows back together with their respective groups by .sort_values()
, as follows:
df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')
df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)
Result:
print(df_out)
eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Agri Hlds -0.3723
4 2021-09-22 Comm Engy Car 0.1212
5 2021-09-22 Comm Engy Fund -0.1234
6 2021-09-22 Comm Engy Mmt 0.5123
7 2021-09-22 Comm Engy Hlds 0.5101
8 2021-09-22 Comm Industry Car -0.7612
9 2021-09-22 Comm Industry Fund 0.1212
10 2021-09-22 Comm Industry Mmt -0.1234
11 2021-09-22 Comm Industry Hlds -0.7634
12 2021-09-22 Comm Metal Car 0.5123
13 2021-09-22 Comm Metal Fund -0.7612
14 2021-09-22 Comm Metal Mmt 0.1212
15 2021-09-22 Comm Metal Hlds -0.1277
16 2021-09-23 Equity Agri Car 0.6541
17 2021-09-23 Equity Agri Fund 0.5123
18 2021-09-23 Equity Agri Mmt -0.1874
19 2021-09-23 Equity Agri Hlds 0.9790
20 2021-09-23 Equity Engy Car 0.1212
21 2021-09-23 Equity Engy Fund -0.6234
22 2021-09-23 Equity Engy Mmt 0.5123
23 2021-09-23 Equity Engy Hlds 0.0101
24 2021-09-23 Equity Industry Car -0.1612
25 2021-09-23 Equity Industry Fund 0.1212
26 2021-09-23 Equity Industry Mmt -0.1934
27 2021-09-23 Equity Industry Hlds -0.2334
28 2021-09-23 Equity Metal Car 0.5123
29 2021-09-23 Equity Metal Fund 0.5412
30 2021-09-23 Equity Metal Mmt 0.1212
31 2021-09-23 Equity Metal Hlds 1.1747
Setup
df = pd.read_clipboard(',')
eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Engy Car 0.1212
4 2021-09-22 Comm Engy Fund -0.1234
5 2021-09-22 Comm Engy Mmt 0.5123
6 2021-09-22 Comm Industry Car -0.7612
7 2021-09-22 Comm Industry Fund 0.1212
8 2021-09-22 Comm Industry Mmt -0.1234
9 2021-09-22 Comm Metal Car 0.5123
10 2021-09-22 Comm Metal Fund -0.7612
11 2021-09-22 Comm Metal Mmt 0.1212
12 2021-09-23 Equity Agri Car 0.6541
13 2021-09-23 Equity Agri Fund 0.5123
14 2021-09-23 Equity Agri Mmt -0.1874
15 2021-09-23 Equity Engy Car 0.1212
16 2021-09-23 Equity Engy Fund -0.6234
17 2021-09-23 Equity Engy Mmt 0.5123
18 2021-09-23 Equity Industry Car -0.1612
19 2021-09-23 Equity Industry Fund 0.1212
20 2021-09-23 Equity Industry Mmt -0.1934
21 2021-09-23 Equity Metal Car 0.5123
22 2021-09-23 Equity Metal Fund 0.5412
23 2021-09-23 Equity Metal Mmt 0.1212
Interim result:
print(df_sum)
eff_date mdl_cd ast_cd value prop_cd
0 2021-09-22 Comm Agri -0.3723 Hlds
1 2021-09-22 Comm Engy 0.5101 Hlds
2 2021-09-22 Comm Industry -0.7634 Hlds
3 2021-09-22 Comm Metal -0.1277 Hlds
4 2021-09-23 Equity Agri 0.9790 Hlds
5 2021-09-23 Equity Engy 0.0101 Hlds
6 2021-09-23 Equity Industry -0.2334 Hlds
7 2021-09-23 Equity Metal 1.1747 Hlds