Replacing a string value in Python-CodePudding

I have a column named "status" full of string values either "legitimate" or "phishing". I'm trying to convert them into a 0 for "legitimate" or 1 for "phishing". Currently my approach is to replace "legitimate" with a string value of "0", and "phishing" with a string value of "1", then convert the strings "0" and "1" to the int values 0 and 1. I'm getting the error:

TypeError: '(0, status legitimate Name: 0, dtype: object)' is an invalid key

with the following code, what am I doing wrong?

df2 = pd.read_csv('dataset_phishing.csv', usecols=[87], dtype=str)

leg = 'legitimate'
phi = 'phishing'
for i in df2.iterrows():
if df2[i] == leg:
df2[i].replace('legitimate', '0')
else if df2[i] == phi:
df2[i].replace('phishing', '1')

CodePudding user response：

Here iterrow gives you tuple which can't be used as index, that why you get that error. Here is a simple solution:

import pandas as pd
df2=pd.DataFrame([["legitimate"],["phishing"]],columns=["status"])
leg = 'legitimate'
phi = 'phishing'
for i in range(len(df2)):
    df2.iloc[i]["status"]='1' if df2.iloc[i]["status"]==phi else '0'
print(df2)

Here is more pythonic way to do this:

import pandas as pd
import numpy as np
df2=pd.DataFrame([["legitimate"],["phishing"]],columns=["status"])
leg = 'legitimate'
phi = 'phishing'
df2["status"]=np.where(df2["status"]==phi,'1','0')
print(df2)

Hope this helps you

CodePudding user response：

Here is another way to do this

import pandas as pd
import numpy as np
data = {'status': ["legitimate", "phishing"]}
df = pd.DataFrame(data)
leg = 'legitimate'
phi = 'phishing'
df.loc[df["status"] == leg, "status"] = 0
df.loc[df["status"] == phi, "status"] = 1
print(df)