Home > OS >  Select pandas rows where column is a substring
Select pandas rows where column is a substring

Time:02-11

I'd like to filter my dataframe such that only rows that have a column containing a substring of another string are selected. I know that the opposite can be done like this:

selection = df[df.str.contains(substring)]

But how would I do it such that the substring is in the dataframe and it is compared to another string. What I've tried are

import pandas
a = pandas.DataFrame({"b":["foo","bar"]})
selection = a[a.b.str in "foot"] # should match first row
selection = a[a.b.str.isin("foot")] # should match first row
selection = a[a.b.str.isin("foobar")] # should match both rows

but these won't work

CodePudding user response:

You can do this.

import pandas as pd
a = pd.DataFrame({"b":["foo","bar"]})
selection = a[a.b.apply(lambda x: x in "foot")] # should match first row
selection = a[a.b.apply(lambda x: x in "foot")] # should match first row
selection = a[a.b.apply(lambda x: x in "foobar")] # should match both rows

CodePudding user response:

What you could try is make function like so:

f = lambda substring: substring in string

Where string is your target string. Then, apply this function elementwise to the df.

df.applymap(f)

This should return a DataFrame with True where the element is a substring of your target string, and False where it isn't. To find out which rows contain a True, i.e. contain elements that are substrings of your target string, us df.any():

df.applymap(lambda substring: substring in string).any(axis = 1)

If you want to just select these rows use slicing:

df[df.applymap(lambda substring: substring in string).any(axis = 1)]
  • Related