I'm trying to split a column Class
into multiple columns and change column names based on that.
ID Name Class
0 12 John A
1 13 Mark A
2 14 Tony B
3 15 Marcus C
4 16 Phill D
5 17 Jack A
final df
ID Name Class A B C D
0 12 John A A
1 13 Mark A A
2 14 Tony B B
3 15 Marcus C C
4 16 Phill D D
5 17 Jack A A
CodePudding user response:
A potentially slow way of doing this would be to define a function and then loop over all possible answers for each item in the original column.
#define a function to see if matched value
def new_column_val(row, value, column):
if row[column] == value:
return value
else:
return None
#create the new columns
for class_name in df["class"].unique():
df[class] = df.apply(new_column_val, args = (class_name, "class")
CodePudding user response:
you can use get_dummies:
mask=pd.get_dummies(df.Class).replace(1,np.nan)
for col in mask.columns:
mask[col].fillna(col, inplace=True)
final=df.join(mask.replace(0,np.nan))
final
ID Name Class A B C D
0 12 John A A
1 13 Mark A A
2 14 Tony B B
3 15 Marcus C C
4 16 Phill D D
5 17 Jack A A
CodePudding user response:
import numpy as np
uniq_class = df['Class'].unique().tolist()
# create a diagonal matrix with unique class as value
D = np.diag(uniq_class).tolist()
# map the diagonal matrix dictionary for each class value
temp = dict(zip(uniq_class, D))
# map class values to the temp dictionary
df[uniq_class] = df['Class'].map(temp).tolist()
df
Output:
ID Name Class A B C D
0 12 John A A
1 13 Mark A A
2 14 Tony B B
3 15 Marcus C C
4 16 Phill D D
5 17 Jack A A