Home > Mobile >  Pandas groupby over consecutive duplicates
Pandas groupby over consecutive duplicates

Time:12-31

Given a table,

Id Value
1 1
2 2
2 3
3 4
4 5
4 6
2 8
2 3
1 1

Instead of a simple groupby('Id').agg({'Value':'sum'}) which would perform aggregation over all the instances and yield a table with only four rows, I wish the result to aggregate only over the nearby instances and hence maintaining the order the table was created.

The expected output is following,

Id Value
1 1
2 5
3 4
4 11
2 11
1 1

If not possible with pandas groupby, any other kind of trick would also be greatly appreciated.

Note: If the above example is not helpful, basically what I want is to somehow compress the table with aggregating over 'Value'. The aggregation should be done only over the duplicate 'Id's which occur one exactly after the other.

CodePudding user response:

#create a group of consecutive identicals and rename it to g

s=pd.Series(((df['Id']==df['Id'].shift(-1))|(df['Id']!=df['Id'].shift())).cumsum(),name='g')

#groupby the group of consecutives and sum. Drop the unwanted column

df.groupby([s,'Id']).agg(sum).reset_index().drop(columns=['g'])

Outcome

   Id  Value
0   1      1
1   2      5
2   3      4
3   4     11
4   2     11
5   1      1

CodePudding user response:

here is one way :

df.groupby([df.Id,(df["Id"] != df.Id.shift(1)).cumsum()])['Value'].sum()
print(df)

output:

Id  
1        1
         1
2        5
         11
3        4
4        11
  • Related