Home > Blockchain >  How can I apply an expanding window to the names of groupby results?
How can I apply an expanding window to the names of groupby results?

Time:11-15

I would like to use pandas to group a dataframe by one column, and then run an expanding window calculation on the groups. Imagine the following dataframe:

G Val
A 0
A 1
A 2
B 3
B 4
C 5 
C 6 
C 7

What I am looking for is a way to group the data by column G (resulting in groups ['A', 'B', 'C']), and then applying a function first to the items in group A, then to items in groups A and B, and finally items in groups A to C.

For example, if the function is sum, then the result would be

A 3
B 10
C 28

For my problem the function that is applied needs to be able to access all original items in the dataframe, not only the aggregates from the groupby.

For example when applying mean, the expected result would be

A 1
B 2
C 3.5

A: mean([0,1,2]), B: mean([0,1,2,3,4]), C: mean([0,1,2,3,4,5,6,7]).

CodePudding user response:

cummean not exist, so possible solution is aggregate counts and sum, use cumulative sum and for mean divide:

df = df.groupby('G')['Val'].agg(['size', 'sum']).cumsum()
s = df['sum'].div(df['size'])
print (s)
A    1.0
B    2.0
C    3.5
dtype: float64

If need general solution is possible extract expanding groups and then use function in dict comprehension like:

g = df['G'].drop_duplicates().apply(list).cumsum()

s = pd.Series({x[-1]: df.loc[df['G'].isin(x), 'Val'].mean() for x in g})
print (s)
A    1.0
B    2.0
C    3.5
dtype: float64
  • Related