df= pd.DataFrame({'Age': [30, 35, 37, 33, 34, 30],
'Name': ['A', 'B', 'B', 'A', 'A', 'B']})
df2= pd.DataFrame({'Age': [30, 35], 'Name': ['A', 'B']})
How would I go on about filtering df for df2 so that for each df2['Name'] in df, the Age in df should be below df2['Age']? My solution was to copy paste df2 into conditions like so but I think creating a list/dataframe for the filter would be a better idea as you may extend it in future:
display(
df.loc[
(
((df["Name"] == "A") & (df["Age"] < 30))
| ((df["Name"] == "B") & (df["Age"] < 35))
),
["Age", "Name"],
]
)
CodePudding user response:
IIUC, you can use merge
then query
:
out = df.merge(df2, on='Name', suffixes=(None, '2')).query('Age < Age2')[df.columns]
print(out)
# Output
Age Name
5 30 B
Step by step:
# Merge data
>>> out = df.merge(df2, on='Name', suffixes=(None, '2'))
Age Name Age2
0 30 A 30
1 33 A 30
2 34 A 30
3 35 B 35
4 37 B 35
5 30 B 35
# Filter out
>>> out = out.query('Age < Age2')
Age Name Age2
5 30 B 35
# Restore columns
>>> out = out[df.columns]
Age Name
5 30 B
CodePudding user response:
I would merge
the 2 dataframes together, to have another column from df2 of the age filter, like this:
dfmerge = df.merge(df2, on='Name', suffixes=[None, ' Filter'])
Age Name Age Filter
0 30 A 30
1 33 A 30
2 34 A 30
3 35 B 35
4 37 B 35
5 30 B 35
You can then use .loc
to filter on where Age
is less than Age Filter
:
dfout = dfmerge.loc[dfmerge['Age'] < dfmerge['Age Filter'], ['Age', 'Name']]
Age Name
5 30 B
CodePudding user response:
Use DataFrame.query
with generate conditions by df2
values:
d = df2.to_dict('records')
q = ' | '.join(f'Name == \"{x["Name"]}\" & Age < {x["Age"]}' for x in d)
print (q)
Name == "A" & Age < 30 | Name == "B" & Age < 35
df = df.query(q)[["Age", "Name"]]
print (df)
Age Name
5 30 B
Or filter by mask in list comprehension and join mask by np.logical_or.reduce:
mask = np.logical_or.reduce([(df['Name'] == x["Name"]) & (df.Age < x["Age"]) for x in d])
df = df.loc[mask, ["Age", "Name"]]
print (df)
Age Name
5 30 B