Pandas average every ith row of dataframe subset based on column value-CodePudding

I have a dataframe that looks like the following

id       value
11         0
11         3
11         1
11         2
4          3
4          1
4          8
4          2
5          0
5          3
5          1
5          2
4          2
4          0
4          1
4          3
11         2
11         1
11         0
11         3

I'm hoping to average the i_th rows of the N (=4) subsets, that have id that are equal; then, end up with the following

id       value
11         1
11         2
11         0.5
11         2.5
4          2.5
4          0.5
4          4.5
4          2.5
5          0
5          3
5          1
5          2

e.g. for id=11:

[0 2, 3 1, 1 0, 2 3]/2 = [1, 2, 0.5, 2.5]

CodePudding user response：

You can create a grouper that will group each consecutive group of identical numbers, and then for each group, get a cumcount for each group. Then, group by id and that cumcount:

consecutive_id_grouper = df['id'].ne(df['id'].shift(1)).cumsum()
cumcount_grouper = df['id'].groupby(consecutive_id_grouper).cumcount()
avg = df.groupby([df['id'], cumcount_grouper], as_index=False, sort=False)['value'].mean()

Output:

>>> avg
    id  value
0   11    1.0
1   11    2.0
2   11    0.5
3   11    2.5
4    4    2.5
5    4    0.5
6    4    4.5
7    4    2.5
8    5    0.0
9    5    3.0
10   5    1.0
11   5    2.0

CodePudding user response：

Another possible approach by reshaping and taking the mean. Would break if there are multiple levels of replicating groups.

(df.groupby('id', sort=False)
   .apply(lambda x: x['value'].values.reshape(2,-1).T.mean(1) if len(x)==df.id.value_counts().max() else x['value'].values)
   .explode()
   .reset_index(name='value'))

Output

    id value
0   11   1.0
1   11   2.0
2   11   0.5
3   11   2.5
4    4   2.5
5    4   0.5
6    4   4.5
7    4   2.5
8    5     0
9    5     3
10   5     1
11   5     2