Replacing multiple strings in a dataframe-CodePudding

I am trying to replace multiple categorical variables from a dataframe with a set of values.

I tried the following codes:

data['Gender'] = data['Gender'].replace(to_replace={"male","M","m","female","f","F"}, value={"Male","Male","Male","Female", "Female", "Female"}).

I want every m, M, or male to be replaced by Male. Same for the female category.

I got error:

ValueError: Replacement lists must match in length. Expecting 6 got 2

CodePudding user response：

The issue with your code is that you use sets for the arguments of the methods. The cardinality may be fine for to_replace, as all elements are unique. For value, the set you define actually gets to be {"Male", "Female"}, which does not match the cardinality of to_replace. Even if the cardinality is matched, sets do not guarantee an order, so it is not a suitable data structure for the job at hand. Instead, If you use lists or tuples, this would just work:

data['Gender'] = data['Gender'].replace(to_replace=("male","M","m","female","f","F"), value=("Male","Male","Male","Female", "Female", "Female")).

although using a dict may lead to simpler to read code, as the replacements are written close together:

data["Gender"] = data["Gender"].replace({"m" : "Male", "M" : "Male", "male": "Male", "f": "Female", "F": "Female", "female": "Female"})

CodePudding user response：

Here is one way.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Gender': ['m', 'M', 'f', 'F', 'm']})
print(df)
    
  Gender
0      m
1      M
2      f
3      F
4      m

replace_values = {'m' : 'Male', 'M' : 'Male', 'f':'Female','F':'Female'}                                                                                          
df = df.replace({"Gender": replace_values}) 
df

   Gender
0    Male
1    Male
2  Female
3  Female
4    Male