I have a dataframe of dim (5004346 x 14). I am trying to add a new column (Risk_level
) based if values in one column (iter
) match the values in different lists.
I have:
def assign_risk(df):
if df['iter'].isin(no):
val = 'no_risk'
elif df['iter'].isin(low):
val = 'low_risk'
elif df['iter'].isin(med):
val = 'med_risk'
else:
val = 'high_risk'
return val
# create new column with risk attribute
df['Risk_level'] = df.apply(assign_risk, axis=1)
df
But i'm getting the error 'str' object has no attribute 'isin'
.
A kind of mwe:
temp = pd.DataFrame({'iter' : [1,1,1,2,2,2,3,3,3,4,4,4,4]})
temp2 = [1,2]
temp3 = [3]
temp4 = [4]
def test_temp(df):
if df['iter'].isin(temp2):
val = 'no_risk'
elif df['iter'].isin(temp3):
val = 'low_risk'
else:
val = 'high_risk'
return val
temp['test'] = temp.apply(test_temp, axis=1)
temp
Also returns 'numpy.int64' object has no attribute 'isin'
Why isn't this working?? i've been trying for hours and don't understand....
CodePudding user response:
df['iter']
in apply
is a scalar value, personally speaking I would name it as row['iter']
, you need change isin
to in
def test_temp(df):
if df['iter'] in temp2:
val = 'no_risk'
elif df['iter'] in temp3:
val = 'low_risk'
else:
val = 'high_risk'
return val
You can use np.select
instead
temp['test_where'] = np.select(
[temp['iter'].isin(temp2),
temp['iter'].isin(temp3),],
['no_risk', 'low_risk'],
'high_risk'
)
CodePudding user response:
You can use replace
or map
temp['iter'].replace({
1: "no_risk",
2: "no_risk",
3: "low_risk",
4: "high_risk"
})
or if you want to use the lists as dict keys, then convert them to tuples, though I personally think this is less pythonic
temp['iter'].replace({
tuple(temp2): "no_risk",
tuple(temp3): "low_risk",
tuple(temp4): "high_risk"
})
You can also use numpy
's `where
import numpy as np
np.where(temp["iter"].isin(temp2), "no_risk",
np.where(temp["iter"].isin(temp3), "low_risk", "high_risk")
Why does isin
fail?
isin
returns an array with several booleans (a mix of T/F) which will be ambiguous for a boolean check.