What I have. I have a column 'Student' with students name and their personalities. I have a list named as 'qualities' that consisits of qualities that is required for filtering purpose. What I want. I want a column next to to the 'Student' that returns the matching string from the list.
#What I Have
import pandas as pd
Personality = {'Student':["Aysha is clever", "Ben is stronger", "Cathy is clever and strong", "Dany is intelligent", "Ella is naughty", "Fred is quieter"]}
index_labels=['1','2','3','4','5','6']
df = pd.DataFrame(Personality,index=index_labels)
qualities = ['calm', 'clever', 'quiet', 'bold', 'strong', 'cute']
#What I want
CodePudding user response:
Use str.findall and then split by ,
df['ex'] = df['Student'].str.findall('|'.join(qualities)).apply(set).str.join(', ')
new = df["ex"].str.split(pat = ",", expand=True)[1]
df =pd.concat([df, new], axis = 1)
df = df.fillna('')
print(df)
Gives #
Student ex 1
1 Aysha is clever clever
2 Ben is stronger strong
3 Cathy is clever and strong clever, strong strong
4 Dany is intelligent
5 Ella is naughty
6 Fred is quieter quiet
CodePudding user response:
You can use str.extractall
and unstack
, then join
to the original DataFrame:
import re
pattern = '|'.join(map(re.escape, qualities))
out = df.join(df['Student'].str.extractall(f'({pattern})')[0].unstack())
Output:
Student 0 1
1 Aysha is clever clever NaN
2 Ben is stronger strong NaN
3 Cathy is clever and strong clever strong
4 Dany is intelligent NaN NaN
5 Ella is naughty NaN NaN
6 Fred is quieter quiet NaN