My Panda's DataFrame Replace Function is not working-CodePudding

I am new to Python so can't really figure it out. I want to change my 'Class' column in DataFrame 'data' so that values '<=50K' are replaced with 0 and values '>50K' are replaced with 1 so I don't get an error of type mismatch while clustering.

See The Code Here

I have tried to do so in the following snippets:

data.replace(to_replace='<=50K',value=0)
data.replace(to_replace='>50K',value=1)

and

data.replace(to_replace='<=50K',value=0, method ="ffill", inplace=True)
data.replace(to_replace='>50K',value=1, method ="ffill", inplace=True)

Any approach to achieve goal other than replacing the string values with integers for clustering will be much appreciated as I cannot think of any other way.

CodePudding user response：

Use a dictionary as reference for replace:

df = pd.DataFrame([['<=50K', '>50K'],
                   ['a', '<=50K']])
#        0      1
# 0  <=50K   >50K
# 1      a  <=50K

df = df.replace({'<=50K': 0, '>50K': 1})

output:

   0  1
0  0  1
1  a  0

If acting on a single columns, use map:

df[0] = df[0].map({'<=50K': 0, '>50K': 1})

To handle missing values with a default:

d = {'<=50K': 0, '>50K': 1}
df = df.applymap(lambda x: d.get(x, -1))

output:

   0  1
0  0  1
1 -1  0

CodePudding user response：

You can do:

# if only one column:
df['Class'].apply(lambda x: 1 if x=='<=50K' else 0 if x=='>=50K' else x)

# or if you want to apply on the whole dataframe:
df.applymap(lambda x: 1 if x=='<=50K' else 0 if x=='>=50K' else x)

CodePudding user response：

From your image, you didn't assign the replaced value to original. You can do

data = data.replace(to_replace='<=50K',value=0)
data = data.replace(to_replace='>50K',value=1)

Problem why your inplace method doesn't work is that you use the method argument which will ignore the value argument and use its own strategy to replace the to_replace value, you can use

data.replace(to_replace='>50K',value=1, inplace=True)