Home > database >  Dataframe new columns to tell if the row contains column's header text
Dataframe new columns to tell if the row contains column's header text

Time:05-14

2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note column contains the new column's header text.

Example as the second screenshot.

enter image description here

Some lines work for a few columns. When there are a lot of new columns, it's not efficient.

What would be the good way to do so? Thank you.

import pandas as pd
from io import StringIO

csvfile = StringIO(
'''Name\tNote
Mike\tBright, Kind
Lily\tFriendly
Kate\tConsiderate, energetic
John\tReliable, friendly
Ale\tBright''')

df = pd.read_csv(csvfile, sep = '\t', engine='python')

col_list =  df['Note'].tolist()

n_list = []
for c in col_list:
    for _ in c.split(','):
        n_list.append(_)

df = df.assign(**dict.fromkeys(n_list, ''))
    
df["Bright"][df['Note'].str.contains("Bright")] = "Yes"

CodePudding user response:

You can try .str.get_dummies then replace 1 with Yes

df = df.join(df['Note'].str.get_dummies(', ').replace({1: 'Yes', 0: ''}))
print(df)

   Name                    Note Bright Considerate Friendly Kind Reliable energetic friendly
0  Mike            Bright, Kind    Yes                       Yes
1  Lily                Friendly                         Yes
2  Kate  Considerate, energetic                Yes                              Yes
3  John      Reliable, friendly                                       Yes                Yes
4   Ale                  Bright    Yes
  • Related