How to search a list of strings in a data frame column and return the matching string as an adjacent-CodePudding

What I have. I have a column 'Student' with students name and their personalities. I have a list named as 'qualities' that consisits of qualities that is required for filtering purpose. What I want. I want a column next to to the 'Student' that returns the matching string from the list.

#What I Have

import pandas as pd

Personality = {'Student':["Aysha is clever", "Ben is stronger", "Cathy is clever and strong", "Dany is intelligent", "Ella is naughty", "Fred is quieter"]}
index_labels=['1','2','3','4','5','6']
df = pd.DataFrame(Personality,index=index_labels)

qualities = ['calm', 'clever', 'quiet', 'bold', 'strong', 'cute']

#What I want

Output

CodePudding user response：

Use str.findall and then split by ,

df['ex'] = df['Student'].str.findall('|'.join(qualities)).apply(set).str.join(', ')

new = df["ex"].str.split(pat = ",", expand=True)[1]
df =pd.concat([df, new], axis = 1)
df = df.fillna('')
print(df)

Gives #

                      Student              ex        1
1             Aysha is clever          clever         
2             Ben is stronger          strong         
3  Cathy is clever and strong  clever, strong   strong
4         Dany is intelligent                         
5             Ella is naughty                         
6             Fred is quieter           quiet

CodePudding user response：

You can use str.extractall and unstack, then join to the original DataFrame:

import re

pattern = '|'.join(map(re.escape, qualities))

out = df.join(df['Student'].str.extractall(f'({pattern})')[0].unstack())

Output:

                      Student       0       1
1             Aysha is clever  clever     NaN
2             Ben is stronger  strong     NaN
3  Cathy is clever and strong  clever  strong
4         Dany is intelligent     NaN     NaN
5             Ella is naughty     NaN     NaN
6             Fred is quieter   quiet     NaN