Home > Enterprise >  how do I construct a pandas boolean series from an arbitrary number of conditions
how do I construct a pandas boolean series from an arbitrary number of conditions

Time:11-22

I have a dataframe and I want to locate rows in the dataframe based on an arbitrary number of boolean conditions on multiple columns. Currently I'm doing this by formatting a complex query string, which is an unsafe pattern (although I'm not too concerned about the specific code here). It looks like this:

df = pd.DataFrame({
    'a_id': [1, 3, 5],
    'b_id': [2, 7, 9],
    'c_id': [3, 4, 5]
})
ids_of_interest = [2, 4]
components_to_query = ['a', 'c']

query = '({})'.format(')|('.join([
    f'{c}_id.isin(@ids_of_interest)' for c in component
]))
df.query(query)
   a_id  b_id  c_id
0     2     2     3
1     3     7     4

The only other way I can come up with to do this is below, but it involves a very non-pythonic initialization of an array that's then modified in a loop.

query = pd.Series([False]*len(df))
for c in component:
    query = query | df[c   '_id'].isin(ids_of_interest)

What's the pythonic way to locate these rows (using query or any other method)?

CodePudding user response:

You could do with any

col = [f'{c}_id' for c in components_to_query]
out = df[df[col].isin(ids_of_interest).any(1)]
Out[268]: 
   a_id  b_id  c_id
1     3     7     4
  • Related