I see that there are many questions regarding str.contains and np.where, so sorry if its a duplicate. I just lost the overview.
I am wondering why the function str.contains inside of np.where produces a positive results when it is applied on np.NaN? (in the way that I am getting a 1, as if the string would contain the search-word)
df = pd.DataFrame({'A': ['Mouse', 'dog', 'cat', '23', np.NaN]})
df['B']=np.where(df.A.str.contains('og'),1,0)
print(df)
A B
0 Mouse 0
1 dog 1
2 cat 0
3 23 0
4 NaN 1
I know that I can come to the right result by setting na=False
as argument inside of str.contain.
I am just wondering about the behavior and want to understand why the result comes up like that.
CodePudding user response:
The reason is that df.A.str.contains('og')
evaluates to np.NAN
for the NaN entry and np.NaN is Trueish. You can try that like
if np.NAN:
print("This gets printed")
As np.where
returns 1 in your case whenever the given condition evaluates to True, you get back a 1 where you have a NaN in the input.