Home > Software engineering >  Efficient way to do an incremental groupby in pandas
Efficient way to do an incremental groupby in pandas

Time:10-21

I would like to do an "incremental groupby". I have the following dataframe:

v1     increment
0.1     0
0.5     0
0.42    1
0.4     1
0.3     2
0.7     2

I would like to compute the average of column v1, by incrementally grouping by the column "increment". For instance when I do the first groupby for 0, I would get the average of the first two rows. The for the second groupby, I would get the average of the first 4 rows ( both increment= 0 and 1), then for the third groupby I would get the average of increment = 0,1 and 2)

Any idea how I could do that efficiently?

Expected output:

group  average of v1
0         0.3
1         0.355
2         0.403

CodePudding user response:

You can compute the cumulated sum and the cumulated size, then divide:

g = df.groupby('increment')['v1']  # set up a grouper for efficiency
out = (g.sum().cumsum()        # cumulated sum
       .div(g.size().cumsum()) # divide by cumulated size
       .reset_index(name='average of v1')
       )

output:

   increment  average of v1
0          0       0.300000
1          1       0.355000
2          2       0.403333

CodePudding user response:

You can do a cumsum of v1 value then do a cumsum of each group size

cumsum = df.groupby('increment')['v1'].sum().cumsum()
cumsize = df.groupby('increment')['v1'].size().cumsum()

out = (cumsum.div(cumsize)
       .to_frame('average of v1')
       .reset_index())
print(out)

   increment  average of v1
0          0       0.300000
1          1       0.355000
2          2       0.403333
  • Related