Want to convert the categories to binary columns, concatenated to the df. Category column values should be new columns with 0 or 1s for each id based on if the value is present or not.
df = pd.DataFrame({"id": [0,1,1,3,3],
"value1": ["ryan", "delta", "delta", "delta", "alpha"],
"category": ["teacher", "pilot", "engineer", "pilot", "teacher"],
"value2": [1, 1, 2, 3, 7]})
df
Answer df should be:
finaldf = pd.DataFrame({"id": [0,1,3],
"teacher":[1,0,1],
"pilot":[0,1,1],
"engineer": [0,1,0]})
CodePudding user response:
Use pd.get_dummies
:
finaldf = pd.get_dummies(df, columns=["category"], prefix="", prefix_sep="")
output:
value1 value2 engineer pilot teacher
0 0 ryan 1 0 0 1
1 1 delta 1 0 1 0
2 2 delta 2 1 0 0
3 3 delta 3 0 1 0
Edit to new question ;)
Use pd.crosstab
with some additional cleaning:
finaldf = pd.crosstab(df["id"], df["category"]).reset_index().rename_axis(columns=None)
output:
id engineer pilot teacher
0 0 0 0 1
1 1 1 1 0
2 3 0 1 1