Could you please help me with my task please. I have a two DataFrames in Python. One of them (df1) has a column with text strings. The second one (df2) has another text values.
df1:
some text |
---|
hello |
world |
my name is |
nick |
df2:
text to find |
---|
d |
z |
x |
h |
I need to check if at least the one value of df2['text to find'] is in df1['some text'] and set some flag next to each value of df1. Finally I need to get something like this:
some text | flag |
---|---|
hello | 1 |
world | 1 |
my name is | 0 |
nick | 0 |
Thank you in advance!
CodePudding user response:
Use Series.str.contains
with joined values by |
for regex or
, last cast boolean to 1,0
by converting to integers:
df1['flag'] = df1['some text'].str.contains('|'.join(df2['text to find'])).astype(int)
print (df1)
some text flag
0 hello 1
1 world 1
2 my name is 0
3 nick 0
If necessary test by words boundaries:
print (df1)
some text
0 hello
1 world
2 my name is #<- match my
3 nick myamar #<- dont match my if substring
print (df2)
text to find
0 my
1 z
2 x
3 h
df1['flag'] = df1['some text'].str.contains('|'.join(df2['text to find'])).astype(int)
pat = '|'.join(r"\b{}\b".format(x) for x in df2['text to find'])
df1['flag1'] = df1['some text'].str.contains(pat).astype(int)
print (df1)
some text flag flag1
0 hello 1 0
1 world 0 0
2 my name is 1 1
3 nick myamar 1 0