i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row.
input
name comments keywords
0 paul account is active active,activated,activ
1 john account is activated active,activated,activ
2 max account is activateds active,activated,activ
expected output
match
True
True
True
CodePudding user response:
The most efficient here is to loop, you can use set
intersection:
df['match'] = [set(c.split()).intersection(k.split(',')) > set()
for c,k in zip(df['comments'], df['keywords'])]
Output:
name comments keywords match
0 paul account is active active,activated,activ True
1 john account is activated active,activated,activ True
2 max account is activateds active,activated,activ False
Used input:
df = pd.DataFrame({'name': ['paul' , 'john' , 'max'],
'comments': ['account is active' ,'account is activated','account is activateds'],
'keywords': ['active,activated,activ', 'active,activated,activ', 'active,activated,activ']})
With a minor variation you could check for substring match ("activ" would match "activateds"):
df['substring'] = [any(w in c for w in k.split(','))
for c,k in zip(df['comments'], df['keywords'])]
Output:
name comments keywords substring
0 paul account is active active,activated,activ True
1 john account is activated active,activated,activ True
2 max account is activateds active,activated,activ True
CodePudding user response:
Use:
keys = ('|').join([f'({x})' for x in df['keywords'].iloc[0].split(',')])
df['comments'].str.contains(keys)
Output:
0 True
1 True
2 True
Name: comments, dtype: bool