Home > database >  Pandas select rows matching variable number of values in many columns
Pandas select rows matching variable number of values in many columns

Time:10-09

I am trying to write a function that will select a subset of a pd.DataFrame. I want to use multiple selection values for different columns.

For example:

column_a matches 'a' or 'b' or ...
column_b matches 'A' or 'B' or 'X' ...

CodePudding user response:

If you provide your values as variable args or a list you can use pd.Series.isin

# Using `*args`
def select_args(df, column, *args):
    return df[df[column].isin(args)]

# Using a list
def select_list(df, column, list_):
    return df[df[column].isin(list_)]

Use like so:

>>> df_sub = select_args(df, "indicator", "A", "B", "C")
>>> df_sub = select_list(df, "indicator", ["A", "B", "C"])

For using multiple columns, with many different variables you should structure these data as pairs. For example, you could use a dict where each key is a column and the value is a list:

search_params = {
    "column_a": ["a", "b", "c"],
    "column_b": ["A", "B", "C"],
}

Then you can create your boolean arrays, combine them using np.all and subset your DataFrame:

def select_multiple(df, selectors):
    bools = [df[col].isin(vals).to_numpy() for col, vals in selectors.items()]
    return df[np.all(bools, axis=0)]

>>> df_sub = select_multiple(df, search_params)
  • Related