For example, I have the following:
. | a | b | benchmark |
---|---|---|---|
0 | 1 | 2 | 1 |
1 | 1 | 5 | 3 |
and I would like to apply a condition in Pandas for each column as:
def f(x):
if x > benchmark:
# X being the values of a or b
return x
else:
return 0
But I don't know how to do that. If I did df.apply(f)
I can't access other cells in the row as x
is just the value of the one cell.
I don't want to create a new column either. I want to directly change the value of the cell as I compare it to benchmark, clearing or 0'ing the cells that that do not meet the benchmark.
Any insight?
CodePudding user response:
You don't need a function, instead use vectorial operations:
out = df.where(df.gt(df['benchmark'], axis=0), 0)
To change the values in place:
df[df.le(df['benchmark'], axis=0)] = 0
Output:
a b benchmark
0 0 2 0
1 0 5 0
If you don't want to affect benchmark
:
m = df.le(df['benchmark'], axis=0)
m['benchmark'] = False
df[m] = 0
Output:
a b benchmark
0 0 2 1
1 0 5 3