I am new to Python so can't really figure it out. I want to change my 'Class' column in DataFrame 'data' so that values '<=50K' are replaced with 0 and values '>50K' are replaced with 1 so I don't get an error of type mismatch while clustering.
I have tried to do so in the following snippets:
data.replace(to_replace='<=50K',value=0)
data.replace(to_replace='>50K',value=1)
and
data.replace(to_replace='<=50K',value=0, method ="ffill", inplace=True)
data.replace(to_replace='>50K',value=1, method ="ffill", inplace=True)
Any approach to achieve goal other than replacing the string values with integers for clustering will be much appreciated as I cannot think of any other way.
CodePudding user response:
Use a dictionary as reference for replace
:
df = pd.DataFrame([['<=50K', '>50K'],
['a', '<=50K']])
# 0 1
# 0 <=50K >50K
# 1 a <=50K
df = df.replace({'<=50K': 0, '>50K': 1})
output:
0 1
0 0 1
1 a 0
If acting on a single columns, use map
:
df[0] = df[0].map({'<=50K': 0, '>50K': 1})
To handle missing values with a default:
d = {'<=50K': 0, '>50K': 1}
df = df.applymap(lambda x: d.get(x, -1))
output:
0 1
0 0 1
1 -1 0
CodePudding user response:
You can do:
# if only one column:
df['Class'].apply(lambda x: 1 if x=='<=50K' else 0 if x=='>=50K' else x)
# or if you want to apply on the whole dataframe:
df.applymap(lambda x: 1 if x=='<=50K' else 0 if x=='>=50K' else x)
CodePudding user response:
From your image, you didn't assign the replaced value to original. You can do
data = data.replace(to_replace='<=50K',value=0)
data = data.replace(to_replace='>50K',value=1)
Problem why your inplace
method doesn't work is that you use the method
argument which will ignore the value
argument and use its own strategy to replace the to_replace value, you can use
data.replace(to_replace='>50K',value=1, inplace=True)