Home > Software design >  Keep maximum row in consecutive rows with specific value in column based on another column value
Keep maximum row in consecutive rows with specific value in column based on another column value

Time:02-19

I have a df.

import pandas as pd 
df = pd.DataFrame({'id_c':[1] * 4   [2] * 3   [3] * 4,
        'Run':[7,8,5,4,3,2,1,2,3,4,5], 
      'Date_diff':[4,12,0,0,2,2,10,1,1,3,3]})

id_c   Run   Date_diff

1    7      4
1    8      12
1    5      0
1    4      0
2    3      2
2    2      2
2    1      10
3    2      1
3    3      1
3    4      3
3    5      3

For each unique value of id_c , if Date_diff equals to 0 , 1 , 2 for two consecutive rows , I want to keep the row with the maximum value in Run.

I tried :

df.groupby(['id_c' , 'Date_diff'])['Run'].idxmax()]

But it also selects maximum values for values of Date_diff different than 0 , 1 , 2.

The desired output would be :

id_c Run   Date_diff
1    7      4
1    8      12
1    5      0
2    3      2
2    1      10
3    3      1
3    4      3
3    5      3

Thanks!

CodePudding user response:

IIUC, compute a custom group and get the max index per group, then slice:

# get values not in 0/1/2
mask = ~df['Date_diff'].isin([0,1,2])
# group the consecutive 0/1/2 and get id of max Run
idx = df.groupby(['id_c', (mask|mask.shift()).cumsum()])['Run'].idxmax().values

# slice output with max ids
out = df.loc[idx]

output:

    id_c  Run  Date_diff
0      1    7          4
1      1    8         12
2      1    5          0
4      2    3          2
6      2    1         10
8      3    3          1
9      3    4          3
10     3    5          3
  • Related