I have dataframe(df) that looks like something like this:
Shape | Weight | Colour |
---|---|---|
Circle | 5 | Blue, Red |
Square | 7 | Yellow, Red |
Triangle | 8 | Blue, Yellow, Red |
Rectangle | 10 | Green |
I would like to label encode the "Colour" column so that the dataframe looks like this:
Shape | Weight | Blue | Red | Yellow | Green |
---|---|---|---|---|---|
Circle | 5 | 1 | 1 | 0 | 0 |
Square | 7 | 0 | 1 | 1 | 0 |
Triangle | 8 | 1 | 1 | 1 | 0 |
Rectangle | 10 | 0 | 0 | 0 | 1 |
Is there an easy function to do this type of conversion ? Any pointers in the right direction would be appreciated. Thanks.
CodePudding user response:
Try:
df["Colour"] = df["Colour"].str.split(r"\s*,\s*", regex=True)
x = df.explode("Colour")
df_out = (
pd.concat(
[df.set_index("Shape"), pd.crosstab(x["Shape"], x["Colour"])], axis=1
)
.reset_index()
.drop(columns="Colour")
)
print(df_out)
Prints:
Shape Weight Blue Green Red Yellow
0 Circle 5 1 0 1 0
1 Square 7 0 0 1 1
2 Triangle 8 1 0 1 1
3 Rectangle 10 0 1 0 0