I'm trying to filter a column on pandas based on a string, but the issue that I'm facing is that the rows are lists and not only strings.
A small example of the column
tags
['get_mail_mail']
['app', 'oflline_hub', 'smart_home']
['get_mail_mail', 'smart_home']
['web']
[]
[]
['get_mail_mail']
and I'm using this
df[df["tags"].str.contains("smart_home", case=False, na=False)]
but it's returning an empty dataframe.
CodePudding user response:
You can explode
, then compare and aggregate with groupby.any
:
m = (df['tags'].explode()
.str.contains('smart_home', case=False, na=False)
.groupby(level=0).any()
)
out = df[m]
Or concatenate the string with a delimiter and use str.contains
:
out = df[df['tags'].agg('|'.join).str.contains('smart_home')]
Or use a list comprehension:
out = df[[any(s=='smart_home' for s in l) for l in df['tags']]]
output:
tags
1 [app, oflline_hub, smart_home]
2 [get_mail_mail, smart_home]
CodePudding user response:
You could try:
# define list of searching patterns
pattern = ["smart_home"]
df.loc[(df.apply(lambda x: any(m in str(v)
for v in x.values
for m in pattern),
axis=1))]
Output
tags
-- ------------------------------------
1 ['app', 'oflline_hub', 'smart_home']
2 ['get_mail_mail', 'smart_home']