Home > Back-end >  Filter pandas dataframe rows where a specific row with column A and value X has column B with value
Filter pandas dataframe rows where a specific row with column A and value X has column B with value

Time:03-16

I wonder if there's a simpler way to filter a pandas DataFrame rows where a specific row with column A and value X has column B with value Y greater than a parameter Z.


For example:

A B
72154 X1 0.998429
72155 X2 0.584253
72156 X3 0.797648
72157 X2 0.981707
72158 X1 0.698844
72159 X3 0.987943
72160 X1 0.797648
72161 X3 0.984621
72162 X2 0.221968

I've managed to get it done with this code:

import pandas as pd

Z = 0.8
mask1 = (df.A.isin(['X1', 'X2']))
mask2 = (df.B > Z)
mask3 = (df.A == 'X3')
sub_df_x1_x2 = df[mask1 & mask2]
sub_df_x3  = df[mask3]
final_df = pd.concat([sub_df_x1_x2, sub_df_x3])

But I don't believe this is the cleanest or best way to do it. Do you guys have any idea? I was thinking about something like this, but I was not able to get it to work.

mask1 = (df.A.isin(['X1', 'X2']) & df.B > Z)
# or
mask1 = (df.A.isin(['X1', 'X2'])[B] > Z)

CodePudding user response:

You could chain the two with |:

out = df[(df.A.isin(['X1', 'X2']) & (df.B > Z)) | (df.A == 'X3')]

or using the definitions you already have:

out = df[(mask1 & mask2) | mask3]

Output:

        A         B
72154  X1  0.998429
72156  X3  0.797648
72157  X2  0.981707
72159  X3  0.987943
72161  X3  0.984621
  • Related