How can i get average between first row and row before current row[i] in each group of data frame fo-CodePudding

for example i have loop in data frame, in each group-by(id) when i am in current row[i] i want to get average of all previous rows in same group

eg// in (a) group when i am in row 4 i want to get average for (20,18,19) when i am in row 5 i want to get average for (20,18,19,20)

and so on for all rows and groups in data frame

id	total
a	20
a	18
a	19
a	20
a	21
a	15
b	13
b	nan
b	15
b	5

CodePudding user response：

you can use expanding()to define a calculation window and groupby() to group by your indexes, and then reset your indexes to recover the original shape of your dataframe like :

df = df.groupby('id').expanding(1).mean()

df.reset_index(inplace=True)
df.drop('level_1', axis=1, inplace=True)

print(df)

output:

  id      total
0  a  20.000000
1  a  19.000000
2  a  19.000000
3  a  19.250000
4  a  19.600000
5  a  18.833333
6  b  13.000000
7  b  13.000000
8  b  14.000000
9  b  11.000000

CodePudding user response：

You can use pandas groupby() and expanding() for this. Groupby id, use expanding to get a running output, then mean() for the average:

df.groupby('id').expanding().mean().reset_index()

  id      total
0  a  20.000000
1  a  19.000000
2  a  19.000000
3  a  19.250000
4  a  19.600000
5  a  18.833333
6  b  13.000000
7  b  13.000000
8  b  14.000000
9  b  11.000000

To get the average up to but not including each row, you can just group again, and shift these numbers by 1. Full line of code:

out = df.groupby('id').expanding().mean().reset_index().groupby('id').shift(1)

   level_1  total
0      NaN    NaN
1      0.0  20.00
2      1.0  19.00
3      2.0  19.00
4      3.0  19.25
5      4.0  19.60
6      NaN    NaN
7      6.0  13.00
8      7.0  13.00
9      8.0  14.00

Then you can increase level_1 by 1, and merge this back to the index of the original dataframe to get the desired result:

out['level_1'] = out['level_1']   1
(df.reset_index().merge(out, left_on='index', right_on='level_1',
                        how='left', suffixes=[None, ' average'])[['id', 'total', 'total average']])


  id  total  total average
0  a   20.0            NaN
1  a   18.0          20.00
2  a   19.0          19.00
3  a   20.0          19.00
4  a   21.0          19.25
5  a   15.0          19.60
6  b   13.0            NaN
7  b    NaN          13.00
8  b   15.0          13.00
9  b    5.0          14.00

Full code:

out = df.groupby('id').expanding().mean().reset_index().groupby('id').shift(1)
out['level_1'] = out['level_1']   1
(df.reset_index().merge(out, left_on='index', right_on='level_1',
                        how='left', suffixes=[None, ' average'])[['id', 'total', 'total average']])