for example i have loop in data frame, in each group-by(id) when i am in current row[i] i want to get average of all previous rows in same group
eg// in (a) group when i am in row 4 i want to get average for (20,18,19) when i am in row 5 i want to get average for (20,18,19,20)
and so on for all rows and groups in data frame
id | total |
---|---|
a | 20 |
a | 18 |
a | 19 |
a | 20 |
a | 21 |
a | 15 |
b | 13 |
b | nan |
b | 15 |
b | 5 |
CodePudding user response:
you can use expanding()
to define a calculation window and groupby()
to group by your indexes, and then reset your indexes to recover the original shape of your dataframe like :
df = df.groupby('id').expanding(1).mean()
df.reset_index(inplace=True)
df.drop('level_1', axis=1, inplace=True)
print(df)
output:
id total
0 a 20.000000
1 a 19.000000
2 a 19.000000
3 a 19.250000
4 a 19.600000
5 a 18.833333
6 b 13.000000
7 b 13.000000
8 b 14.000000
9 b 11.000000
CodePudding user response:
You can use pandas groupby()
and expanding()
for this. Groupby id, use expanding to get a running output, then mean() for the average:
df.groupby('id').expanding().mean().reset_index()
id total
0 a 20.000000
1 a 19.000000
2 a 19.000000
3 a 19.250000
4 a 19.600000
5 a 18.833333
6 b 13.000000
7 b 13.000000
8 b 14.000000
9 b 11.000000
To get the average up to but not including each row, you can just group again, and shift these numbers by 1. Full line of code:
out = df.groupby('id').expanding().mean().reset_index().groupby('id').shift(1)
level_1 total
0 NaN NaN
1 0.0 20.00
2 1.0 19.00
3 2.0 19.00
4 3.0 19.25
5 4.0 19.60
6 NaN NaN
7 6.0 13.00
8 7.0 13.00
9 8.0 14.00
Then you can increase level_1 by 1, and merge this back to the index of the original dataframe to get the desired result:
out['level_1'] = out['level_1'] 1
(df.reset_index().merge(out, left_on='index', right_on='level_1',
how='left', suffixes=[None, ' average'])[['id', 'total', 'total average']])
id total total average
0 a 20.0 NaN
1 a 18.0 20.00
2 a 19.0 19.00
3 a 20.0 19.00
4 a 21.0 19.25
5 a 15.0 19.60
6 b 13.0 NaN
7 b NaN 13.00
8 b 15.0 13.00
9 b 5.0 14.00
Full code:
out = df.groupby('id').expanding().mean().reset_index().groupby('id').shift(1)
out['level_1'] = out['level_1'] 1
(df.reset_index().merge(out, left_on='index', right_on='level_1',
how='left', suffixes=[None, ' average'])[['id', 'total', 'total average']])