I have a dataframe like:
Tag Class
A P
A Q
B P
B Q
C P
C Q
C R
I want to group by Tag and keep the first value from Class. However, if this value was used previously, look for the next value within the tag.
Expected output:
Tag Class
A P
B Q
C R
If there is no class left for the tag, then return null (or don't include Tag in output).
I have been trying to do this with drop_duplicates, but with no luck. How can I achieve this?
CodePudding user response:
We can define a custom function lets call it dedupe
, which maintains an internal state in a set
variable s
to keep track of the previously used classes and returns the first available class for each group which is previously not used
def dedupe():
s = set()
def _dedupe(c):
c = c[~c.isin(s)]
if len(c) > 0:
s.add(c.iat[0])
return c.iat[0]
return _dedupe
df.groupby('Tag', sort=False, as_index=False)['Class'].apply(dedupe())
Tag Class
0 A P
1 B Q
2 C R