I have a dataframe and I want to locate rows in the dataframe based on an arbitrary number of boolean conditions on multiple columns. Currently I'm doing this by formatting a complex query string, which is an unsafe pattern (although I'm not too concerned about the specific code here). It looks like this:
df = pd.DataFrame({
'a_id': [1, 3, 5],
'b_id': [2, 7, 9],
'c_id': [3, 4, 5]
})
ids_of_interest = [2, 4]
components_to_query = ['a', 'c']
query = '({})'.format(')|('.join([
f'{c}_id.isin(@ids_of_interest)' for c in component
]))
df.query(query)
a_id b_id c_id
0 2 2 3
1 3 7 4
The only other way I can come up with to do this is below, but it involves a very non-pythonic initialization of an array that's then modified in a loop.
query = pd.Series([False]*len(df))
for c in component:
query = query | df[c '_id'].isin(ids_of_interest)
What's the pythonic way to locate these rows (using query
or any other method)?
CodePudding user response:
You could do with any
col = [f'{c}_id' for c in components_to_query]
out = df[df[col].isin(ids_of_interest).any(1)]
Out[268]:
a_id b_id c_id
1 3 7 4