Pandas mask with composite expression behaviour-CodePudding

this question was previously asked (and then deleted) by an user, I was looking to find a solution so I could give out an answer when the question disappeared and I, moreover, can't seem to make sense of pandas' behaviour so I would appreciate some clarity, the original question stated something along the lines of:

How can I replace every negative value except those in a given list with NaN in a Pandas dataframe?

my setup to reproduce the scenario is the following:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A' : [x for x in range(4)],
    'B' : [x for x in range(-2, 2)]
})

this should technically only be an issue of correctly passing a boolean expression to pd.where, my attemped solution looks like:

df[df >= 0 | df.isin([-2])]

which produces:

index	A	B
0	0	NaN
1	1	NaN
2	2	0
3	3	1

which also cancels the number in the list!

moreover if I mask the dataframe with each of the two conditions I get the correct behavior:

with `df[df >= 0]` (identical to the compound result)

index	A	B
0	0	NaN
1	1	NaN
2	2	0
3	3	1

with `df[df.isin([-2])]` (identical to the compound result)

index	A	B
0	NaN	-2.0
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN

So it seems like I am

Running into some undefined behaviour as a result of performing logic on NaN values
I have got something wrong

Anyone can clarify this situation to me?

CodePudding user response：

Solution

df[(df >= 0) | (df.isin([-2]))]

Explanation

In python, bitwise OR, |, has a higher operator precedence than comparison operators like >=: https://docs.python.org/3/reference/expressions.html#operator-precedence

When filtering a pandas DataFrame on multiple boolean conditions, you need to enclose each condition in parentheses. More from the boolean indexing section of the pandas user guide:

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df['A'] > 2 & df['B'] < 3 as df['A'] > (2 & df['B']) < 3, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3).

with df[df >= 0] (identical to the compound result)

with df[df.isin([-2])] (identical to the compound result)

Solution

Explanation

with `df[df >= 0]` (identical to the compound result)

with `df[df.isin([-2])]` (identical to the compound result)