I am new to using np.where()
, so of course I tried to use my new toy for this problem. But it does not work.
I have a dataframe.
Close M N O P
0.1 0.2 0.3 0.4 0.5
0.2 0.1 0.6 0.1 0.0
Colslist = [M,N,O,P]
I want a new column called Q with the result of this formula. The formula I thought up is:
df["Q"] = np.where(df[Colslist] >= (Close 0.3), 1,0)
The output would be a 1 in both rows. In row 0 there are 2 values greater than, and row 1 there is 1 value greater than.
I believe the problem in what I wrote is it requires all values to be greater than to output a 1.
So what I am needing is a 1 in Q column if there is a single value in that row for the list of columns that is greater than equal to close of the same row 0.3.
What's the best vectorized way to do this?
CodePudding user response:
The problem is that the axes don't match in your condition. The output of
df[Colslist] >= (df['Close'] 0.3)
is
M N O P 0 1
0 False False False False False False
1 False False False False False False
which doesn't make sense.
You could use ge
method to make sure that you're comparing values in Colslist
with "Close" column values. So the result of:
df[Colslist].ge(df['Close'] 0.3, axis=0)
is
M N O P
0 False False True True
1 False True False False
Now, since your condition is that it is True if there is at least one True in a row, you can use any
on axis=1
. Then the final code:
Colslist = ['M','N','O','P']
df["Q"] = np.where(df[Colslist].ge(df['Close'] 0.3, axis=0).any(axis=1), 1,0)
Output:
Close M N O P Q
0 0.1 0.2 0.3 0.4 0.5 1
1 0.2 0.1 0.6 0.1 0.0 1
CodePudding user response:
Here's a one-liner with pure Pandas
so you don't need to convert between NumPy
and Pandas
(might help with speed if your dataframe is really long):
df["Q"] = (df.max(axis=1) >= df["Close"] 0.3).astype(int)
It finds the max value in each row and sees if it's larger than the required value. If the max value isn't large enough, then no value in the row will be. It takes advantage of the fact that you don't actually need to count the number of elements in a row that are greater than df["Close"] 0.3
; you just need to know if at least 1 element meets the condition.
Then it converts the True
and False
answers to 1
and 0
using astype(int)
.