I got an error when I used the default python max function in the Numpy where method. The goal is to obtain a new column based on the condition defined in the where method.
I used the following function:
def function (df):
df["new col"]= np.where(df["col 1"]> 10, max(df["col 1"]-df["col 2"],0),0)
return df
The error I got is as follows:
the truth value of a series is ambiguous. Use a a.empty(), a.bool(), a.item(), a.any() or a.all().
However, by eliminating the 0 in the max() the code would properly work. I need to to use the zero in the max function to avoid negative values.
df["new col"]= np.where(df["col 1"]> 10, max(df["col 1"]-df["col 2"]),0)
CodePudding user response:
You can use .clip(lower=0)
for this:
np.where(df["col 1"]> 10, (df["col 1"]-df["col 2"]).clip(lower=0), 0)
CodePudding user response:
What causes the error is not the np.where
function, but the max
. In order to avoid this error, you can replace python's built-in max
with numpy's np.max
, or with np.maximum
, depending on what you're trying to achieve
Example:
import pandas as pd
import numpy as np
df = pd.DataFrame ({"col 1":[1,20,3,40],"col 2":[10,2,30,4]})
Using np.max
:
df["new col"]= np.where(df["col 1"]> 10, np.maximum(df["col 1"]-df["col 2"],0),0)
Output:
col 1 col 2 new col
0 1 10 0
1 20 2 18
2 3 30 0
3 40 4 36
Here the positions where col 1 > 10 receive value the max of col1 - col2 for that same position and 0 if this value is negative. The rest of the positions receive value 0.
Using np.maximum
:
df["new col"]= np.where(df["col 1"]> 10, np.max(df["col 1"]-df["col 2"],0),0)
Output:
col 1 col 2 new col
0 1 10 0
1 20 2 36
2 3 30 0
3 40 4 36
Here the positions where col 1 > 10 receive the max value of col1 - col2, while the other positions receive 0.