My problems is the same as, How to group by continuous records in SQL, only I need a solution in Pandas.
Given a df like
ID Colour
------------
1 Red
2 Red
3 Red
4 Red
5 Red
6 Green
7 Green
8 Green
9 Green
10 Red
11 Red
12 Red
13 Red
14 Green
15 Green
16 Green
17 Blue
18 Blue
19 Red
20 Blue
I want it grouped into
color minId
------------
Red 1
Green 6
Red 10
Green 14
Blue 17
Red 19
Blue 20
It is okay to change the name of the colors (e.g., Green1
)
The solution should generalize into other aggregations other than just min
CodePudding user response:
You can grouping by consecutive values by helper Series created by compared shifted values and cumsum
and then aggregate first
and min
:
g = df['Colour'].ne(df['Colour'].shift()).cumsum()
df = df.groupby(g).agg(color=('Colour','first'), minId=('ID','min')).reset_index(drop=True)
print (df)
color minId
0 Red 1
1 Green 6
2 Red 10
3 Green 14
4 Blue 17
5 Red 19
6 Blue 20