Home > front end >  Vectorized way to find if 1 value in row from list of columns is greater than threshold
Vectorized way to find if 1 value in row from list of columns is greater than threshold

Time:04-16

I am new to using np.where(), so of course I tried to use my new toy for this problem. But it does not work.

I have a dataframe.

Close  M    N    O    P
0.1   0.2  0.3  0.4  0.5
0.2   0.1  0.6  0.1  0.0

Colslist = [M,N,O,P] 

I want a new column called Q with the result of this formula. The formula I thought up is:

df["Q"] = np.where(df[Colslist] >= (Close   0.3), 1,0)

The output would be a 1 in both rows. In row 0 there are 2 values greater than, and row 1 there is 1 value greater than.

I believe the problem in what I wrote is it requires all values to be greater than to output a 1.

So what I am needing is a 1 in Q column if there is a single value in that row for the list of columns that is greater than equal to close of the same row 0.3.

What's the best vectorized way to do this?

CodePudding user response:

The problem is that the axes don't match in your condition. The output of

df[Colslist] >= (df['Close']   0.3)

is

       M      N      O      P      0      1
0  False  False  False  False  False  False
1  False  False  False  False  False  False

which doesn't make sense.

You could use ge method to make sure that you're comparing values in Colslist with "Close" column values. So the result of:

df[Colslist].ge(df['Close']   0.3, axis=0)

is

       M      N      O      P
0  False  False   True   True
1  False   True  False  False

Now, since your condition is that it is True if there is at least one True in a row, you can use any on axis=1. Then the final code:

Colslist = ['M','N','O','P'] 
df["Q"] = np.where(df[Colslist].ge(df['Close']   0.3, axis=0).any(axis=1), 1,0)

Output:

   Close    M    N    O    P  Q
0    0.1  0.2  0.3  0.4  0.5  1
1    0.2  0.1  0.6  0.1  0.0  1

CodePudding user response:

Here's a one-liner with pure Pandas so you don't need to convert between NumPy and Pandas (might help with speed if your dataframe is really long):

df["Q"] = (df.max(axis=1) >= df["Close"]   0.3).astype(int)

It finds the max value in each row and sees if it's larger than the required value. If the max value isn't large enough, then no value in the row will be. It takes advantage of the fact that you don't actually need to count the number of elements in a row that are greater than df["Close"] 0.3; you just need to know if at least 1 element meets the condition.

Then it converts the True and False answers to 1 and 0 using astype(int).

  • Related