Home > Software design >  Pandas get multiple first occurrences of each group
Pandas get multiple first occurrences of each group

Time:07-18

I want to get the multiple first occurrences of each group (ie: every time a group appears after another group), given that some groups may appear more than once in the data frame. For example:

col1 col2
AAA 111
AAA 222
AAA 444
BBB 111
BBB 333
CCC 777
CCC 333
AAA 888
AAA 777

I would like to return both "first" occurrences of the group AAA:

col1 col2
AAA 111
BBB 111
CCC 777
AAA 888

I've tried using groupby but it only returns the first occurrence after grouping. Is there a simple way to get this kind of result?

CodePudding user response:

You can use GroupBy.first on a custom group:

df.groupby(df['col1'].ne(df['col1'].shift()).cumsum(), as_index=False).first()

output:

  col1  col2
0  AAA   111
1  BBB   111
2  CCC   777
3  AAA   888

CodePudding user response:

You can use:

df[df['col1'].ne(df['col1'].shift())]

CodePudding user response:

A solution using itertools -

from itertools import groupby
firsts = []
for k, v in groupby(df.to_records(), key=lambda x: x[1]):
    first, *rest = v
    firsts.append(first)
pd.DataFrame.from_records(firsts, columns=df.reset_index().columns, index='index')

Output

      col1  col2
index           
0      AAA   111
3      BBB   111
5      CCC   777
7      AAA   888
  • Related