How do you apply a groupby in a dataframe only when a condition is met in a row and still keep all other rows that do not meet said condition?
For instance, let's say I have a dataframe:
Number | Version | Binary | Random column |
---|---|---|---|
1 | 0 | True | a |
1 | 1 | True | b |
1 | 2 | True | c |
2 | 0 | False | a |
2 | 1 | False | b |
And, with this dataframe, I would like to groupby all of the numbers which binary value is True and keep only the one with the latest version (that is, the largest number), while still respecting and keeping the rows which do not meet the condition described.
That said, it should give a result like this:
Number | Version | Binary | Random column |
---|---|---|---|
1 | 2 | True | c |
2 | 0 | False | a |
2 | 1 | False | b |
CodePudding user response:
I would use boolean indexing here:
# is the row the max version of its group?
m1 = df['Version'].eq(df.groupby('Binary')['Version'].transform('max'))
# is the binary not True?
m2 = ~df['Binary']
# keep if either condition above is met
df[m1|m2]
Output:
Number Version Binary Random column
2 1 2 True c
3 2 0 False a
4 2 1 False b
CodePudding user response:
Here is a way using pd.concat()
and drop_duplicates()
(pd.concat([df.loc[df.loc[df['Binary']].groupby('Number')['Version'].idxmax()],
df.loc[~df['Binary']]],
ignore_index=True))
Output:
Number Version Binary Random column
0 1 2 True c
1 2 0 False a
2 2 1 False b