I made this np select but AND operators don't work!
df = pd.DataFrame({'A': [2107], 'B': [76380700]})
cond = [(df["A"]==2107)|(df["A"]==6316)&(df['B']>=10000000)&(df['B']<=19969999),
(df["A"]==2107)|(df["A"]==6316)&(df['B']>=1000000)&(df['B']<=99999999)]
choices =["Return 1", "Return 2"]
df["C"] = np.select(cond, choices, default = df["A"])
NP select return "Return 1" but correct option is "Return 2"
>>df["C"]
0 Return 1
Cause this line return false
>>df["B"]<=19969999
False
How can I solve this problem?
CodePudding user response:
It's an operator precendence issue. Here's what you wrote:
cond = [
(df["A"]==2107) |
(df["A"]==6316) &
(df['B']>=10000000) &
(df['B']<=19969999),
(df["A"]==2107) |
(df["A"]==6316) &
(df['B']>=1000000) &
(df['B']<=99999999)
]
Here's how that is interpreted:
cond = [
(df["A"]==2107) |
(
(df["A"]==6316) &
(df['B']>=10000000) &
(df['B']<=19969999)
),
(df["A"]==2107) |
(
(df["A"]==6316) &
(df['B']>=1000000) &
(df['B']<=99999999)
)
]
You need parens around the "or" clause:
cond = [
( (df["A"]==2107) | (df["A"]==6316) ) &
(df['B']>=10000000) &
(df['B']<=19969999),
( (df["A"]==2107) | (df["A"]==6316) ) &
(df['B']>=1000000) &
(df['B']<=99999999)
)
]
And, by the way, there is absolutely nothing wrong with writing the expressions like I did there. Isn't it much more clear what's going on when it's spaced out like that?
CodePudding user response:
I think you were missing parenthesis for (df["A"]==2107)|(df["A"]==6316)
. In your script, condition for Return 1 was checking (df["A"]==2107)|(df["A"]==6316))&(df['B']>=10000000)&(df['B']<=19969999)
which means A==2107 OR (A == 6316 & B... & B... ). That's why np.select returns 'Returns 1', because it is True.
df = pd.DataFrame({'A': [2107], 'B': [76380700]})
cond = [((df["A"]==2107)|(df["A"]==6316))&(df['B']>=10000000)&(df['B']<=19969999),
(df["A"]==2107)|(df["A"]==6316)&(df['B']>=1000000)&(df['B']<=99999999)]
choices =["Return 1", "Return 2"]
df["C"] = np.select(cond, choices, default = df["A"])