Home > Net >  Finding at least one value of one dataframe in string values of another dataframe
Finding at least one value of one dataframe in string values of another dataframe

Time:11-25

Could you please help me with my task please. I have a two DataFrames in Python. One of them (df1) has a column with text strings. The second one (df2) has another text values.

df1:

some text
hello
world
my name is
nick

df2:

text to find
d
z
x
h

I need to check if at least the one value of df2['text to find'] is in df1['some text'] and set some flag next to each value of df1. Finally I need to get something like this:

some text flag
hello 1
world 1
my name is 0
nick 0

Thank you in advance!

CodePudding user response:

Use Series.str.contains with joined values by | for regex or, last cast boolean to 1,0 by converting to integers:

df1['flag'] = df1['some text'].str.contains('|'.join(df2['text to find'])).astype(int)
print (df1)
    some text  flag
0       hello     1
1       world     1
2  my name is     0
3        nick     0

If necessary test by words boundaries:

print (df1)
     some text
0        hello
1        world
2   my name is #<- match my
3  nick myamar #<- dont match my if substring

print (df2)
  text to find
0           my
1            z
2            x
3            h

df1['flag'] = df1['some text'].str.contains('|'.join(df2['text to find'])).astype(int)

pat = '|'.join(r"\b{}\b".format(x) for x in df2['text to find'])

df1['flag1'] = df1['some text'].str.contains(pat).astype(int)

print (df1)
     some text  flag  flag1
0        hello     1      0
1        world     0      0
2   my name is     1      1
3  nick myamar     1      0
  • Related