Calculate mean after new recording-CodePudding

I´m reading some excel files where I want to calculate the mean of the column 'Duration' but I want to see the change in mean after recording 1, 2, 3 and so on. So for the below table it would be:

Mean1 = (3 3 3/3)

Mean2 = (3 3 3 4 6/5)

Mean3 = (3 3 3 4 6 4 3 1 4/9)

Each file can have different number of recordings (but always starting from 1) and different number of durations for each recording.

Is is possible in Python to make a script that does this automatically?

Recording	Duration
1	3
1	3
1	3
2	4
2	6
3	4
3	3
3	1
3	4

CodePudding user response：

IIUC, you could do groupby "Recording" and get the ratio of the cumulated sum and cumulated count (using cumsum):

g = df.groupby('Recording')['Duration']
s = g.sum().cumsum()/g.count().cumsum()

output:

Recording
1    3.000000
2    3.800000
3    3.444444
Name: Duration, dtype: float64

intermediates:

>>> g.sum().cumsum()
Recording
1     9
2    19
3    31
Name: Duration, dtype: int64

>>> g.count().cumsum()
Recording
1    3
2    5
3    9
Name: Duration, dtype: int64

CodePudding user response：

You can count average by defintion - first get sum and count per Recording, add cumulative sum by DataFrame.cumsum and for mean divide sum by size columns:

df = df.groupby('Recording')['Duration'].agg(['sum','size']).cumsum().reset_index()
df['avg'] = df.pop('sum').div(df.pop('size'))
print (df)
   Recording       avg
0          1  3.000000
1          2  3.800000
2          3  3.444444