I am trying to replace multiple categorical variables from a dataframe with a set of values.
I tried the following codes:
data['Gender'] = data['Gender'].replace(to_replace={"male","M","m","female","f","F"}, value={"Male","Male","Male","Female", "Female", "Female"}).
I want every m, M, or male to be replaced by Male. Same for the female category.
I got error:
ValueError: Replacement lists must match in length. Expecting 6 got 2
CodePudding user response:
The issue with your code is that you use set
s for the arguments of the methods. The cardinality may be fine for to_replace
, as all elements are unique. For value
, the set
you define actually gets to be {"Male", "Female"}
, which does not match the cardinality of to_replace
. Even if the cardinality is matched, set
s do not guarantee an order, so it is not a suitable data structure for the job at hand. Instead, If you use list
s or tuple
s, this would just work:
data['Gender'] = data['Gender'].replace(to_replace=("male","M","m","female","f","F"), value=("Male","Male","Male","Female", "Female", "Female")).
although using a dict
may lead to simpler to read code, as the replacements are written close together:
data["Gender"] = data["Gender"].replace({"m" : "Male", "M" : "Male", "male": "Male", "f": "Female", "F": "Female", "female": "Female"})
CodePudding user response:
Here is one way.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Gender': ['m', 'M', 'f', 'F', 'm']})
print(df)
Gender
0 m
1 M
2 f
3 F
4 m
replace_values = {'m' : 'Male', 'M' : 'Male', 'f':'Female','F':'Female'}
df = df.replace({"Gender": replace_values})
df
Gender
0 Male
1 Male
2 Female
3 Female
4 Male