I would like to do an "incremental groupby". I have the following dataframe:
v1 increment
0.1 0
0.5 0
0.42 1
0.4 1
0.3 2
0.7 2
I would like to compute the average of column v1, by incrementally grouping by the column "increment". For instance when I do the first groupby for 0, I would get the average of the first two rows. The for the second groupby, I would get the average of the first 4 rows ( both increment= 0 and 1), then for the third groupby I would get the average of increment = 0,1 and 2)
Any idea how I could do that efficiently?
Expected output:
group average of v1
0 0.3
1 0.355
2 0.403
CodePudding user response:
You can compute the cumulated sum and the cumulated size, then divide:
g = df.groupby('increment')['v1'] # set up a grouper for efficiency
out = (g.sum().cumsum() # cumulated sum
.div(g.size().cumsum()) # divide by cumulated size
.reset_index(name='average of v1')
)
output:
increment average of v1
0 0 0.300000
1 1 0.355000
2 2 0.403333
CodePudding user response:
You can do a cumsum of v1
value then do a cumsum of each group size
cumsum = df.groupby('increment')['v1'].sum().cumsum()
cumsize = df.groupby('increment')['v1'].size().cumsum()
out = (cumsum.div(cumsize)
.to_frame('average of v1')
.reset_index())
print(out)
increment average of v1
0 0 0.300000
1 1 0.355000
2 2 0.403333