I have a dataframe that looks like the following
id value
11 0
11 3
11 1
11 2
4 3
4 1
4 8
4 2
5 0
5 3
5 1
5 2
4 2
4 0
4 1
4 3
11 2
11 1
11 0
11 3
I'm hoping to average the i_th rows of the N (=4) subsets, that have id
that are equal; then, end up with the following
id value
11 1
11 2
11 0.5
11 2.5
4 2.5
4 0.5
4 4.5
4 2.5
5 0
5 3
5 1
5 2
e.g. for id=11:
[0 2, 3 1, 1 0, 2 3]/2 = [1, 2, 0.5, 2.5]
CodePudding user response:
You can create a grouper that will group each consecutive group of identical numbers, and then for each group, get a cumcount
for each group. Then, group by id
and that cumcount:
consecutive_id_grouper = df['id'].ne(df['id'].shift(1)).cumsum()
cumcount_grouper = df['id'].groupby(consecutive_id_grouper).cumcount()
avg = df.groupby([df['id'], cumcount_grouper], as_index=False, sort=False)['value'].mean()
Output:
>>> avg
id value
0 11 1.0
1 11 2.0
2 11 0.5
3 11 2.5
4 4 2.5
5 4 0.5
6 4 4.5
7 4 2.5
8 5 0.0
9 5 3.0
10 5 1.0
11 5 2.0
CodePudding user response:
Another possible approach by reshaping and taking the mean. Would break if there are multiple levels of replicating groups.
(df.groupby('id', sort=False)
.apply(lambda x: x['value'].values.reshape(2,-1).T.mean(1) if len(x)==df.id.value_counts().max() else x['value'].values)
.explode()
.reset_index(name='value'))
Output
id value
0 11 1.0
1 11 2.0
2 11 0.5
3 11 2.5
4 4 2.5
5 4 0.5
6 4 4.5
7 4 2.5
8 5 0
9 5 3
10 5 1
11 5 2