I'm trying to add column in Dataframe which has a result of checking if other columns have value in it.
This is a test df I made:
df = pd.DataFrame({"condition":[1,np.nan,np.nan,np.nan,1],"a":[np.nan,4,5,6,np.nan],"b":[np.nan,2,"e",2,np.nan],"c":[3,2,1,2,np.nan]})
So what I want to check is, for each row, if condition
column's value is 1
, "b","c","d"
columns should not have values except np.nan. If "condition" has np.nan, result should be "-".
The ideal result should be something like this:
condition a b c check_result
0 1 np.nan np.nan 3 "X"
1 np.nan 4 2 2 "-"
2 np.nan 5 "e" 1 "-"
3 np.nan 6 2 2 "-"
4 1 np.nan np.nan np.nan "O"
I've tried two ways.
First, this one:
def na_and_check(domain,var_list):
y = ""
for var in var_list:
y = '(np.isnan(' domain '["' var '"]))&'
y = y[:-1]
print(y)
return eval(y)
df["check_result"] = np.where(df["condition"]!=1,"-",np.where(na_and_check("df",["a","b","c"]),"O","X"))
But since "b" columns has string value in it, it gives me this Error:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
And for the second one, I tried this:
df["check_result"] = np.where(df["condition"]!=1,"-",np.where(df[["a","b","c"]].any(),"X","O"))
I wasn't confident with .any() and yes, it gives me an error:
ValueError: operands could not be broadcast together with shapes (5,) () (3,)
So um... I don't know what to do next. Any advice or enlightenment, tips will be appreciated. I'll wait for your teach ! Thanks for stopping by. Have a wonderful day.
CodePudding user response:
so you have an if-elif-else situation. Then we can use np.select
for it. It needs the conditions and what to do when they are satisfied:
- your if is: "condition is 1 and a,b,c has all nan"
- your elif is: "condition is nan"
- what remains is else, as usual
conditions = [df.condition.eq(1) & df[["a", "b", "c"]].isna().all(axis=1),
df.condition.isna()]
what_to_do = ["O", "-"]
else_case = "X"
df["check_result"] = np.select(conditions, what_to_do, default=else_case)
df
condition a b c check_result
0 1.0 NaN NaN 3.0 X
1 NaN 4.0 2 2.0 -
2 NaN 5.0 e 1.0 -
3 NaN 6.0 2 2.0 -
4 1.0 NaN NaN NaN O
So we don't write else's condition. It goes to default.