Remove consecutive duplicates while keeping the max value-CodePudding

I am trying to remove consecutive duplicates from column X while keeping the entry with the max value based on column Y, unfortunately with no success. The data frame is as follow:

idx	X	Y
0	A	3
1	B	2
2	A	7
3	A	10
4	B	1
5	C	4
6	A	3
7	A	5

What I want to achieve is:

idx	X	Y
0	A	3
1	B	2
2	A	10
3	B	1
4	C	4
5	A	5

Most of the solutions I found just remove the duplicates tout court without accounting for any repeating pattern.

CodePudding user response：

You need to apply an itertools-style-groupby and then keep the rows where Y is maximal.

>>> df 
   idx  X   Y
0    0  A   3
1    1  B   2
2    2  A   7
3    3  A  10
4    4  B   1
5    5  C   4
6    6  A   3
7    7  A   5
>>> y_max = df.groupby(df['X'].ne(df['X'].shift()).cumsum())['Y'].transform('max')
>>> df[df['Y'] == y_max] 
   idx  X   Y
0    0  A   3
1    1  B   2
3    3  A  10
4    4  B   1
5    5  C   4
7    7  A   5

edit:

Initial solution had a bug and only produced the correct idx column by accident.

CodePudding user response：

Or I'd prefer just simply only specify the groups in the groupby parameters:

df.groupby(df['X'].ne(df['X'].shift()).cumsum(), as_index=False).max()

Or:

df.groupby(df['X'].ne(df['X'].shift()).cumsum()).max().reset_index(drop=True)

Both output:

   idx  X   Y
0    0  A   3
1    1  B   2
2    3  A  10
3    4  B   1
4    5  C   4
5    7  A   5