I have the following dataframe where I want to determinate if the column A is greater than column B and if column C is greater of column B. In case it is smaller, I want to change that value for 0.
d = {'A': [6, 8, 10, 1, 3], 'B': [4, 9, 12, 0, 2], 'C': [3, 14, 11, 4, 9] }
df = pd.DataFrame(data=d)
df
I have tried this with the np.where and it is working:
df[B] = np.where(df[A] > df[B], 0, df[B])
df[C] = np.where(df[B] > df[C], 0, df[C])
However, I have a huge amount of columns and I want to know if there is any way to do this without writing each comparation separately. For example, a loop for.
Thanks
CodePudding user response:
Solution with different ouput, because is compared original columns with DataFrame.diff
and set less like 0
values to 0
by DataFrame.mask
:
df1 = df.mask(df.diff(axis=1).lt(0), 0)
print (df1)
A B C
0 6 0 0
1 8 9 14
2 10 12 0
3 1 0 4
4 3 0 9
If use list comprehension with zip
shifted columns names output is different, because is compared already assigned columns B, C...
:
for a, b in zip(df.columns, df.columns[1:]):
df[b] = np.where(df[a] > df[b], 0, df[b])
print (df)
A B C
0 6 0 3
1 8 9 14
2 10 12 0
3 1 0 4
4 3 0 9
CodePudding user response:
To use a vectorial approach, you cannot simply use a diff as the condition depends on the previous value being replaced or not by 0. Thus two consecutive diff cannot happen.
You can achieve a correct vectorial replacement using a shifted mask:
m1 = df.diff(axis=1).lt(0) # check if < than previous
m2 = ~m1.shift(axis=1, fill_value=False) # and this didn't happen twice
df2 = df.mask(m1&m2, 0)
output:
A B C
0 6 0 3
1 8 9 14
2 10 12 0
3 1 0 4
4 3 0 9