Home > database >  Check if pandas dataframe contains specific string from a list of items
Check if pandas dataframe contains specific string from a list of items

Time:03-24

I have a list

my_list = ['element1 line','element2 ','element3', 'element4 line',....]

and I have a pandas dataframe having df[Sentences] column and df['flag'] column

df
    Sentences               flag
0   abcd    
1   efgh    
2   element1 ijkl           
3   mnop element3 element4      
4   qrst

I want to iterate to each and every row of dataframe of column Sentences. If any of the elements in my_list is present in the Sentences, df['flag'] column should be 1 in the respective row. If no elements is present in the string of sentences in that row, df['flag'] should be 0 for that row.

Expected output:

df
    Sentences                flag
0   abcd                      0
1   efgh                      0
2   element1 ijkl             1 
3   mnop element3 element4    1     
4   qrst                      0

CodePudding user response:

df['flag'] = df['Sentences'].apply(lambda x: 1 if x in my_list else 0)

CodePudding user response:

You need to use a loop:

df['flag'] = [int(any(w in my_list for w in x.split())) for x in df['Sentences']]

output:

                Sentences  flag
0                    abcd     0
1                    efgh     0
2           element1 ijkl     1
3  mnop element3 element4     1
4                    qrst     0

Note that you could use pure pandas, but this is much slower:

df['flag'] = (df['Sentences']
              .str.split()
              .explode().isin(my_list)
              .groupby(level=0).any().astype(int)
              )
  • Related