Home > Software design >  Filtering a column on pandas based on a string
Filtering a column on pandas based on a string

Time:10-12

I'm trying to filter a column on pandas based on a string, but the issue that I'm facing is that the rows are lists and not only strings.

A small example of the column

tags
['get_mail_mail']
['app', 'oflline_hub', 'smart_home']
['get_mail_mail', 'smart_home']
['web']
[]
[]
['get_mail_mail']

and I'm using this

df[df["tags"].str.contains("smart_home", case=False, na=False)]

but it's returning an empty dataframe.

CodePudding user response:

You can explode, then compare and aggregate with groupby.any:

m = (df['tags'].explode()
     .str.contains('smart_home', case=False, na=False)
     .groupby(level=0).any()
    )

out = df[m]

Or concatenate the string with a delimiter and use str.contains:

out = df[df['tags'].agg('|'.join).str.contains('smart_home')]

Or use a list comprehension:

out = df[[any(s=='smart_home' for s in l) for l in df['tags']]]

output:

                             tags
1  [app, oflline_hub, smart_home]
2     [get_mail_mail, smart_home]

CodePudding user response:

You could try:

# define list of searching patterns
pattern = ["smart_home"]

df.loc[(df.apply(lambda x: any(m in str(v) 
                               for v in x.values 
                               for m in pattern), 
       axis=1))]

Output

    tags
--  ------------------------------------
 1  ['app', 'oflline_hub', 'smart_home']
 2  ['get_mail_mail', 'smart_home']
  • Related