Pandas get multiple first occurrences of each group-CodePudding

I want to get the multiple first occurrences of each group (ie: every time a group appears after another group), given that some groups may appear more than once in the data frame. For example:

col1	col2
AAA	111
AAA	222
AAA	444
BBB	111
BBB	333
CCC	777
CCC	333
AAA	888
AAA	777

I would like to return both "first" occurrences of the group AAA:

col1	col2
AAA	111
BBB	111
CCC	777
AAA	888

I've tried using groupby but it only returns the first occurrence after grouping. Is there a simple way to get this kind of result?

CodePudding user response：

You can use GroupBy.first on a custom group:

df.groupby(df['col1'].ne(df['col1'].shift()).cumsum(), as_index=False).first()

output:

  col1  col2
0  AAA   111
1  BBB   111
2  CCC   777
3  AAA   888

CodePudding user response：

You can use:

df[df['col1'].ne(df['col1'].shift())]

CodePudding user response：

A solution using itertools -

from itertools import groupby
firsts = []
for k, v in groupby(df.to_records(), key=lambda x: x[1]):
    first, *rest = v
    firsts.append(first)
pd.DataFrame.from_records(firsts, columns=df.reset_index().columns, index='index')

Output

      col1  col2
index           
0      AAA   111
3      BBB   111
5      CCC   777
7      AAA   888