Home > other >  Why str.contains and np.where returns strange results?
Why str.contains and np.where returns strange results?

Time:01-26

I see that there are many questions regarding str.contains and np.where, so sorry if its a duplicate. I just lost the overview.

I am wondering why the function str.contains inside of np.where produces a positive results when it is applied on np.NaN? (in the way that I am getting a 1, as if the string would contain the search-word)

df = pd.DataFrame({'A': ['Mouse', 'dog', 'cat', '23', np.NaN]})
df['B']=np.where(df.A.str.contains('og'),1,0)
print(df)

        A  B
 0  Mouse  0
 1    dog  1
 2    cat  0
 3     23  0
 4    NaN  1

I know that I can come to the right result by setting na=False as argument inside of str.contain. I am just wondering about the behavior and want to understand why the result comes up like that.

CodePudding user response:

The reason is that df.A.str.contains('og') evaluates to np.NAN for the NaN entry and np.NaN is Trueish. You can try that like

if np.NAN:
    print("This gets printed")

As np.where returns 1 in your case whenever the given condition evaluates to True, you get back a 1 where you have a NaN in the input.

  •  Tags:  
  • Related