I want to get the multiple first occurrences of each group (ie: every time a group appears after another group), given that some groups may appear more than once in the data frame. For example:
col1 | col2 |
---|---|
AAA | 111 |
AAA | 222 |
AAA | 444 |
BBB | 111 |
BBB | 333 |
CCC | 777 |
CCC | 333 |
AAA | 888 |
AAA | 777 |
I would like to return both "first" occurrences of the group AAA:
col1 | col2 |
---|---|
AAA | 111 |
BBB | 111 |
CCC | 777 |
AAA | 888 |
I've tried using groupby but it only returns the first occurrence after grouping. Is there a simple way to get this kind of result?
CodePudding user response:
You can use GroupBy.first
on a custom group:
df.groupby(df['col1'].ne(df['col1'].shift()).cumsum(), as_index=False).first()
output:
col1 col2
0 AAA 111
1 BBB 111
2 CCC 777
3 AAA 888
CodePudding user response:
You can use:
df[df['col1'].ne(df['col1'].shift())]
CodePudding user response:
A solution using itertools
-
from itertools import groupby
firsts = []
for k, v in groupby(df.to_records(), key=lambda x: x[1]):
first, *rest = v
firsts.append(first)
pd.DataFrame.from_records(firsts, columns=df.reset_index().columns, index='index')
Output
col1 col2
index
0 AAA 111
3 BBB 111
5 CCC 777
7 AAA 888