Home > Back-end >  Check if a substring is in a string python
Check if a substring is in a string python

Time:12-08

I have two dataframe, I need to check contain substring from first df in each string in second df and get a list of words that are included in the second df

First df(word):

word
apples
dog
cat
cheese

Second df(sentence):

sentence
apples grow on a tree
...
I love cheese

I tried this one:

tru=[]
for i in word['word']:
    if i in sentence['sentence'].values:    
        tru.append(i)

And this one:

tru=[]
for i in word['word']:
    if sentence['sentence'].str.contains(i):    
        tru.append(i)

I expect to get a list like ['apples',..., 'cheese']

CodePudding user response:

One possible way is to use Series.str.extractall:

import pandas as pd

df_word = pd.Series(["apples", "dog", "cat", "cheese"])
df_sentence = pd.Series(["apples grow on a tree", "i love cheese"])

matches = df_sentence.str.extractall(f"({'|'.join(df_word)})")
matches

Output:


                0
    match   
0     0    apples
1     0    cheese

You can then convert the results to a list:

matches[0].unique().tolist()

Output:

['apples', 'cheese']
  • Related