2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note column contains the new column's header text.
Example as the second screenshot.
Some lines work for a few columns. When there are a lot of new columns, it's not efficient.
What would be the good way to do so? Thank you.
import pandas as pd
from io import StringIO
csvfile = StringIO(
'''Name\tNote
Mike\tBright, Kind
Lily\tFriendly
Kate\tConsiderate, energetic
John\tReliable, friendly
Ale\tBright''')
df = pd.read_csv(csvfile, sep = '\t', engine='python')
col_list = df['Note'].tolist()
n_list = []
for c in col_list:
for _ in c.split(','):
n_list.append(_)
df = df.assign(**dict.fromkeys(n_list, ''))
df["Bright"][df['Note'].str.contains("Bright")] = "Yes"
CodePudding user response:
You can try .str.get_dummies
then replace 1
with Yes
df = df.join(df['Note'].str.get_dummies(', ').replace({1: 'Yes', 0: ''}))
print(df)
Name Note Bright Considerate Friendly Kind Reliable energetic friendly
0 Mike Bright, Kind Yes Yes
1 Lily Friendly Yes
2 Kate Considerate, energetic Yes Yes
3 John Reliable, friendly Yes Yes
4 Ale Bright Yes