There is a dataframe like bellow
import pandas as pd
data = {‘ID': [1, 2, 3, 4, 5, 6, 7, 8],
‘LABEL': [’text', ‘logo', ‘logo', ‘person’,’text’,’text’,’person’,’logo'],
‘cluster_label': [c_0, c_0, c_0, c_1, c_1, c_2, c_2, c_3]}
df = pd.DataFrame(data)
I want to make dummy columns for the “cluster_label” column
pd.get_dummies(df,columns=[‘cluster_label'])
however I need to add a prefix regraded to the LABEL column.
Basically, the columns must be text_c_0, logo_c_0, … How can I do that
Many thanx in advance
CodePudding user response:
Do you just need the prefixed columns names? If so:
prefixed_columns_names = [f"{elem[0]}_{elem[1]}" for elem in list(zip(data["LABEL"], data["cluster_label"]))]
print(prefixed_columns_names)
# ['text_c_0', 'logo_c_0', 'logo_c_0', 'person_c_1', 'text_c_1', 'text_c_2', 'person_c_2', 'logo_c_3']
CodePudding user response:
Try this:
import pandas as pd
data = {
'ID': [1, 2, 3, 4, 5, 6, 7, 8],
'LABEL': ['text', 'logo', 'logo', 'person', 'text', 'text', 'person', 'logo'],
'cluster_label': ['c_0', 'c_0', 'c_0', 'c_1', 'c_1', 'c_2', 'c_2', 'c_3']
}
df = pd.DataFrame(data)
pd.get_dummies(df,columns=['cluster_label'])
df['dummy'] = df.apply (lambda row: row['LABEL'] '_' row['cluster_label'], axis=1)
pd.get_dummies(df['dummy'])
## If you want to keep ['ID','LABEL','cluster_label'] in your df :
df = df.join(pd.get_dummies(df['dummy']))