Home > Net >  apply where if only in rows where condition is met
apply where if only in rows where condition is met

Time:09-07

How do you apply a groupby in a dataframe only when a condition is met in a row and still keep all other rows that do not meet said condition?

For instance, let's say I have a dataframe:

Number Version Binary Random column
1 0 True a
1 1 True b
1 2 True c
2 0 False a
2 1 False b

And, with this dataframe, I would like to groupby all of the numbers which binary value is True and keep only the one with the latest version (that is, the largest number), while still respecting and keeping the rows which do not meet the condition described.

That said, it should give a result like this:

Number Version Binary Random column
1 2 True c
2 0 False a
2 1 False b

CodePudding user response:

Here is a way using pd.concat() and drop_duplicates()

(pd.concat([df.loc[df.loc[df['Binary']].groupby('Number')['Version'].idxmax()],
df.loc[~df['Binary']]],
ignore_index=True))

Output:

   Number  Version  Binary Random column
0       1        2    True             c
1       2        0   False             a
2       2        1   False             b

CodePudding user response:

I would use boolean indexing here:

# is the row the max version of its group?
m1 = df['Version'].eq(df.groupby('Binary')['Version'].transform('max'))
# is the binary not True?
m2 = ~df['Binary']

# keep if either condition above is met
df[m1|m2]

Output:

   Number  Version  Binary Random column
2       1        2    True             c
3       2        0   False             a
4       2        1   False             b

alternative

Assuming you would like to perform an extensive operation, you could also apply only if the group matches a condition, else return the group unchanged:

(df.groupby('Binary', group_keys=False, sort=False)
   .apply(lambda g: g.loc[g['Version'].eq(g['Version'].max())]
          if g.name else g)
 )
  • Related