Home > Software engineering >  Filter pandas dataframe by condition for each row
Filter pandas dataframe by condition for each row

Time:06-30

df= pd.DataFrame({'Age': [30, 35, 37, 33, 34, 30],
                  'Name': ['A', 'B', 'B', 'A', 'A', 'B']})
df2= pd.DataFrame({'Age': [30, 35], 'Name': ['A', 'B']})

How would I go on about filtering df for df2 so that for each df2['Name'] in df, the Age in df should be below df2['Age']? My solution was to copy paste df2 into conditions like so but I think creating a list/dataframe for the filter would be a better idea as you may extend it in future:

display(
    df.loc[
        (
            ((df["Name"] == "A") & (df["Age"] < 30))
            | ((df["Name"] == "B") & (df["Age"] < 35))
        ),
        ["Age", "Name"],
    ]
)

CodePudding user response:

IIUC, you can use merge then query:

out = df.merge(df2, on='Name', suffixes=(None, '2')).query('Age < Age2')[df.columns]
print(out)

# Output
   Age Name
5   30    B

Step by step:

# Merge data
>>> out = df.merge(df2, on='Name', suffixes=(None, '2'))
   Age Name  Age2
0   30    A    30
1   33    A    30
2   34    A    30
3   35    B    35
4   37    B    35
5   30    B    35

# Filter out
>>> out = out.query('Age < Age2')
   Age Name  Age2
5   30    B    35

# Restore columns
>>> out = out[df.columns]
   Age Name
5   30    B

CodePudding user response:

I would merge the 2 dataframes together, to have another column from df2 of the age filter, like this:

dfmerge = df.merge(df2, on='Name', suffixes=[None, ' Filter'])

   Age Name  Age Filter
0   30    A          30
1   33    A          30
2   34    A          30
3   35    B          35
4   37    B          35
5   30    B          35

You can then use .loc to filter on where Age is less than Age Filter:

dfout = dfmerge.loc[dfmerge['Age'] < dfmerge['Age Filter'], ['Age', 'Name']]

   Age Name
5   30    B

CodePudding user response:

Use DataFrame.query with generate conditions by df2 values:

d = df2.to_dict('records')

q = ' | '.join(f'Name == \"{x["Name"]}\" & Age < {x["Age"]}' for x in d)
print (q)
Name == "A" & Age < 30 | Name == "B" & Age < 35
    

df = df.query(q)[["Age", "Name"]]
print (df)
   Age Name
5   30    B

Or filter by mask in list comprehension and join mask by np.logical_or.reduce:

mask = np.logical_or.reduce([(df['Name'] == x["Name"]) & (df.Age < x["Age"]) for x in d])
df = df.loc[mask, ["Age", "Name"]]
print (df)
   Age Name
5   30    B
    
  • Related