Home > Software engineering >  Selection in dataframe base on multiple condition
Selection in dataframe base on multiple condition

Time:10-22

I am developping a dashboard using dash. The user can select different parameters and a dataframe is updated (6 parameters).

The idea was to do :

filtering = []
if len(filter1)>0:
    filtering.append("df['col1'].isin(filter1)")
if len(filter2)>0:
    filtering.append("df['col2'].isin(filter2)")
condition = ' & '.join(filtering)
df.loc[condition]

But I have a key error, what i understand, as condition is a string. Any advice on how I can do it ? What is the best practise ?

NB : I have a solution with if condition but I would like to maximise this part, avoiding the copy of the dataframe (>10 millions of rows).

dff = df.copy()
if len(filter1)>0:
    dff = dff.loc[dff.col1.isin(filter1)]
if len(filter2)>0:
    dff = dff.loc[dff.col2.isin(filter2)]

CodePudding user response:

you can use eval:

filtering = []
if len(filter1)>0:
    filtering.append("df['col1'].isin(filter1)")
if len(filter2)>0:
    filtering.append("df['col2'].isin(filter2)")
condition = ' & '.join(filtering)
df.loc[eval(condition)]

CodePudding user response:

You can merge the masks using the & operator and only apply the merged mask once

from functools import reduce

filters = []
if len(filter1)>0:
    filters.append(df.col1.isin(filter1))
if len(filter2)>0:
    filters.append(df.col2.isin(filter2))

if len(filters) > 0:
    final_filter = reduce(lambda a, b: a&b, filters)
    df = df.loc[final_filter]
  • Related