fill new column based on condition-CodePudding

I have a dataframe of dim (5004346 x 14). I am trying to add a new column (Risk_level) based if values in one column (iter) match the values in different lists.

I have:

def assign_risk(df):
    if  df['iter'].isin(no):
        val = 'no_risk'
    elif  df['iter'].isin(low):
        val = 'low_risk'
    elif  df['iter'].isin(med):
        val = 'med_risk'
    else:
        val = 'high_risk'
    return val

# create new column with risk attribute
df['Risk_level'] = df.apply(assign_risk, axis=1)
df

But i'm getting the error 'str' object has no attribute 'isin'.

A kind of mwe:

temp = pd.DataFrame({'iter' : [1,1,1,2,2,2,3,3,3,4,4,4,4]})
temp2 = [1,2]
temp3 = [3]
temp4 = [4]

def test_temp(df):
    if  df['iter'].isin(temp2):
        val = 'no_risk'
    elif  df['iter'].isin(temp3):
        val = 'low_risk'
    else:
        val = 'high_risk'
    return val

temp['test'] = temp.apply(test_temp, axis=1)
temp

Also returns 'numpy.int64' object has no attribute 'isin'

Why isn't this working?? i've been trying for hours and don't understand....

CodePudding user response：

df['iter'] in apply is a scalar value, personally speaking I would name it as row['iter'], you need change isin to in

def test_temp(df):
    if  df['iter'] in temp2:
        val = 'no_risk'
    elif  df['iter'] in temp3:
        val = 'low_risk'
    else:
        val = 'high_risk'
    return val

You can use np.select instead

temp['test_where'] = np.select(
    [temp['iter'].isin(temp2),
     temp['iter'].isin(temp3),],
    ['no_risk', 'low_risk'],
    'high_risk'
)

CodePudding user response：

You can use replace or map

temp['iter'].replace({
    1: "no_risk",
    2: "no_risk",
    3: "low_risk",
    4: "high_risk"
})

or if you want to use the lists as dict keys, then convert them to tuples, though I personally think this is less pythonic

temp['iter'].replace({
    tuple(temp2): "no_risk",
    tuple(temp3):  "low_risk",
    tuple(temp4): "high_risk"
})

You can also use numpy's `where

import numpy as np 
np.where(temp["iter"].isin(temp2), "no_risk",
 np.where(temp["iter"].isin(temp3), "low_risk", "high_risk")

Why does isin fail?

isin returns an array with several booleans (a mix of T/F) which will be ambiguous for a boolean check.