Home > database >  Python: where clause with two conditions
Python: where clause with two conditions

Time:09-27

I have a DataFrame as follows:

data = [[99330,12,122],
   [1123,1230,1287],
   [123,101,812739],
   [1143,12301230,252]]
df1 = pd.DataFrame(data, index=['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'], 
              columns=['col_A', 'col_B', 'col_C'])
df1.index = pd.to_datetime(df1.index)
for col in df1.columns:
    df1[col '_mean'] = df1[col].rolling(1).mean().shift()
    df1[col '_std'] = df1[col].rolling(1).std().shift()
    df1[col '_upper'] = df1[col '_mean']   df1[col '_std']
    df1[col '_lower'] = df1[col '_mean'] - df1[col '_std']
    df1[col '_outlier'] = np.where(df1[col]>df1[col '_upper'] or df1[col]<df1[col '_lower'], 1, 0)

However, the last line gives an error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I want to get a column col '_outlier' which displays 1 if df1[col]>df1[col '_upper'] or if df1[col]<df1[col '_lower']; and display 0 otherwise.

What's the proper way to write this where clause with two conditions?

CodePudding user response:

Have a look at the operater precedence table in the official documentation. Highest precedence from top to bottom. You need to wrap your condition in parenthesis and use pipe | instead of or.

df1[col '_outlier'] = np.where( (df1[col]>df1[col '_upper']) | (df1[col]<df1[col '_lower']) , 1, 0)
  • Related