Why str.contains and np.where returns strange results?-CodePudding

I see that there are many questions regarding str.contains and np.where, so sorry if its a duplicate. I just lost the overview.

I am wondering why the function str.contains inside of np.where produces a positive results when it is applied on np.NaN? (in the way that I am getting a 1, as if the string would contain the search-word)

df = pd.DataFrame({'A': ['Mouse', 'dog', 'cat', '23', np.NaN]})
df['B']=np.where(df.A.str.contains('og'),1,0)
print(df)

        A  B
 0  Mouse  0
 1    dog  1
 2    cat  0
 3     23  0
 4    NaN  1

I know that I can come to the right result by setting na=False as argument inside of str.contain. I am just wondering about the behavior and want to understand why the result comes up like that.

CodePudding user response：

The reason is that df.A.str.contains('og') evaluates to np.NAN for the NaN entry and np.NaN is Trueish. You can try that like

if np.NAN:
    print("This gets printed")

As np.where returns 1 in your case whenever the given condition evaluates to True, you get back a 1 where you have a NaN in the input.