Python: error when using max() in Numpy where method for defining a new column in pandas dataframe-CodePudding

I got an error when I used the default python max function in the Numpy where method. The goal is to obtain a new column based on the condition defined in the where method.

I used the following function:

def function (df):

  df["new col"]= np.where(df["col 1"]> 10, max(df["col 1"]-df["col 2"],0),0)

  return df

The error I got is as follows:

the truth value of a series is ambiguous. Use a a.empty(), a.bool(), a.item(), a.any() or a.all().

However, by eliminating the 0 in the max() the code would properly work. I need to to use the zero in the max function to avoid negative values.

 df["new col"]= np.where(df["col 1"]> 10, max(df["col 1"]-df["col 2"]),0)

CodePudding user response：

You can use .clip(lower=0) for this:

np.where(df["col 1"]> 10, (df["col 1"]-df["col 2"]).clip(lower=0), 0)

CodePudding user response：

What causes the error is not the np.where function, but the max. In order to avoid this error, you can replace python's built-in max with numpy's np.max, or with np.maximum, depending on what you're trying to achieve

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame ({"col 1":[1,20,3,40],"col 2":[10,2,30,4]})

Using `np.max`:

df["new col"]= np.where(df["col 1"]> 10, np.maximum(df["col 1"]-df["col 2"],0),0)

Output:

   col 1  col 2  new col
0      1     10        0
1     20      2       18
2      3     30        0
3     40      4       36

Here the positions where col 1 > 10 receive value the max of col1 - col2 for that same position and 0 if this value is negative. The rest of the positions receive value 0.

Using `np.maximum`:

df["new col"]= np.where(df["col 1"]> 10, np.max(df["col 1"]-df["col 2"],0),0)

Output:

   col 1  col 2  new col
0      1     10        0
1     20      2       36
2      3     30        0
3     40      4       36

Here the positions where col 1 > 10 receive the max value of col1 - col2, while the other positions receive 0.

Using np.max:

Using np.maximum:

Using `np.max`:

Using `np.maximum`: