Home > Enterprise >  Compare two rows on a loop for on Pandas
Compare two rows on a loop for on Pandas

Time:04-11

I have the following dataframe where I want to determinate if the column A is greater than column B and if column C is greater of column B. In case it is smaller, I want to change that value for 0.

d = {'A': [6, 8, 10, 1, 3], 'B': [4, 9, 12, 0, 2], 'C': [3, 14, 11, 4, 9] }
df = pd.DataFrame(data=d)
df

I have tried this with the np.where and it is working:

df[B] = np.where(df[A] > df[B], 0, df[B])
df[C] = np.where(df[B] > df[C], 0, df[C])

However, I have a huge amount of columns and I want to know if there is any way to do this without writing each comparation separately. For example, a loop for.

Thanks

CodePudding user response:

Solution with different ouput, because is compared original columns with DataFrame.diff and set less like 0 values to 0 by DataFrame.mask:

df1 = df.mask(df.diff(axis=1).lt(0), 0)
print (df1)
    A   B   C
0   6   0   0
1   8   9  14
2  10  12   0
3   1   0   4
4   3   0   9

If use list comprehension with zip shifted columns names output is different, because is compared already assigned columns B, C...:

for a, b in zip(df.columns, df.columns[1:]):
    df[b] = np.where(df[a] > df[b], 0, df[b])

print (df)
    A   B   C
0   6   0   3
1   8   9  14
2  10  12   0
3   1   0   4
4   3   0   9

CodePudding user response:

To use a vectorial approach, you cannot simply use a diff as the condition depends on the previous value being replaced or not by 0. Thus two consecutive diff cannot happen.

You can achieve a correct vectorial replacement using a shifted mask:

m1 = df.diff(axis=1).lt(0)               # check if < than previous
m2 = ~m1.shift(axis=1, fill_value=False) # and this didn't happen twice
df2 = df.mask(m1&m2, 0)

output:

    A   B   C
0   6   0   3
1   8   9  14
2  10  12   0
3   1   0   4
4   3   0   9
  • Related