Home > front end >  Check if one of elements in list is in dataframe column
Check if one of elements in list is in dataframe column

Time:03-05

I have DF that looks like below:

columna_1   column_2
1           I am Thomas
2           Are you Thomas
3           How are you?
4           I am fine...
5           Jack, what's up?

and I have a list like this:

names = ["Thomas", "Jack"]

What I need is to find values from list names if there are in column_2 and if there are, it should add new column with the values from list. So it should be like this:

columna_1   column_2          column_3
1           I am Thomas       Thomas
2           Are you Thomas    Thomas
5           Jack, what's up?  Jack

I have started with finding just values but I am not sure how I could add a new column with correct values. Below is my code:

df[df['column_2'].astype("str").apply(lambda x: set(names).issubset(x))]

Do you have any idea? Thanks for help!

CodePudding user response:

Use a regex:

import re
regex = fr'\b({"|".join(map(re.escape, names))})\b'
df['column_3'] = df['column_2'].str.extract(regex, expand=False)

To also drop the non matches:

import re
regex = fr'\b({"|".join(map(re.escape,names))})\b'
(df.assign(column_3=df['column_2'].str.extract(regex, expand=False))
   .dropna(subset=['column_3'])
)

output:

   columna_1          column_2 column_3
0          1       I am Thomas   Thomas
1          2    Are you Thomas   Thomas
4          5  Jack, what's up?     Jack

NB. Note that in case both Thomas and Jack are in the sentence, only the first one in the sentence will match

  • Related