I have a pandas column named coverage, the values can be:
'DAMAGE', 'DAMAGE-THEFT', 'DAMAGE-THEFT-WARRANTY_EXTENSION', 'DAMAGE-FRAUDULENT_USE', etc.
What should be the optimal way to have a column named DAMAGE, another column named THEFT, another named WARRANTY_EXTENSION and another named FRAUDULENT_USE and add for each row 1 or 0 in case it has that type of coverage or not.
I thought about creating a lambda function, but I thing would need to do every time:
df['DAMAGE'] = df.apply (lambda row: my_function_to_split(row), axis=1)
df['THEFT'] = df.apply (lambda row: my_function_to_split(row), axis=1)
etc...
thanks in advance
CodePudding user response:
I think the method you're looking for is this one
So, if you have a dataframe with multiple columns and want to apply this method to only some of them, you can do:
import pandas as pd
names = ["a", "b", "a", "c"]
df = pd.DataFrame({"name": names, "value": list(range(len(names)))})
pd.get_dummies(df, columns=["name"])
[EDIT]
The question is trickier, but you can solve it like that:
import pandas as pd
df = pd.DataFrame({"name": ["a-b", "b-c", "a", "a-c", "c"]})
df["name"].str.get_dummies(sep="-")