How to add prefix to column name according to data in another column-CodePudding

There is a dataframe like bellow

import pandas as pd
  
data = {‘ID': [1, 2, 3, 4, 5, 6, 7, 8],
‘LABEL': [’text', ‘logo', ‘logo', ‘person’,’text’,’text’,’person’,’logo'],
        ‘cluster_label': [c_0, c_0, c_0, c_1, c_1, c_2, c_2, c_3]}
df = pd.DataFrame(data)

I want to make dummy columns for the “cluster_label” column

pd.get_dummies(df,columns=[‘cluster_label'])

however I need to add a prefix regraded to the LABEL column.

Basically, the columns must be text_c_0, logo_c_0, … How can I do that

Many thanx in advance

CodePudding user response：

Do you just need the prefixed columns names? If so:

prefixed_columns_names = [f"{elem[0]}_{elem[1]}" for elem in list(zip(data["LABEL"], data["cluster_label"]))]

print(prefixed_columns_names)
# ['text_c_0', 'logo_c_0', 'logo_c_0', 'person_c_1', 'text_c_1', 'text_c_2', 'person_c_2', 'logo_c_3']

CodePudding user response：

Try this:

import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5, 6, 7, 8],
    'LABEL': ['text', 'logo', 'logo', 'person', 'text', 'text', 'person', 'logo'],
    'cluster_label': ['c_0', 'c_0', 'c_0', 'c_1', 'c_1', 'c_2', 'c_2', 'c_3']
}

df = pd.DataFrame(data)

pd.get_dummies(df,columns=['cluster_label'])



df['dummy'] = df.apply (lambda row: row['LABEL'] '_' row['cluster_label'], axis=1)

pd.get_dummies(df['dummy'])

## If you want to keep ['ID','LABEL','cluster_label'] in your df :
df = df.join(pd.get_dummies(df['dummy']))