Home > front end >  optimal way to split values of a column in different columns as 1 or 0 (one hot encoding)
optimal way to split values of a column in different columns as 1 or 0 (one hot encoding)

Time:01-05

I have a pandas column named coverage, the values can be:

'DAMAGE', 'DAMAGE-THEFT', 'DAMAGE-THEFT-WARRANTY_EXTENSION', 'DAMAGE-FRAUDULENT_USE', etc.

What should be the optimal way to have a column named DAMAGE, another column named THEFT, another named WARRANTY_EXTENSION and another named FRAUDULENT_USE and add for each row 1 or 0 in case it has that type of coverage or not.

I thought about creating a lambda function, but I thing would need to do every time:

df['DAMAGE'] = df.apply (lambda row: my_function_to_split(row), axis=1)
df['THEFT'] = df.apply (lambda row: my_function_to_split(row), axis=1)
etc...

thanks in advance

CodePudding user response:

I think the method you're looking for is this one

So, if you have a dataframe with multiple columns and want to apply this method to only some of them, you can do:

import pandas as pd
names = ["a", "b", "a", "c"]
df = pd.DataFrame({"name": names, "value": list(range(len(names)))})
pd.get_dummies(df, columns=["name"])

[EDIT]

The question is trickier, but you can solve it like that:

import pandas as pd
df = pd.DataFrame({"name": ["a-b", "b-c", "a", "a-c", "c"]})
df["name"].str.get_dummies(sep="-")
  •  Tags:  
  • Related