Clean method to make the smallest n columns of Pandas Dataframe equal to zero-CodePudding

I would like to transform the values of a Pandas Dataframe so that the 3 smallest columns for instance is set to zero:

row1: 0.21, 0.11, 0.24, 0.52, 0.12
row2: 0.31, 0.01, 0.44, 0.52, 0.52

Would become:

row1: 0.0, 0.0, 0.24, 0.52, 0.0
row2: 0.0, 0.0. 0.0, 0.52, 0.52

I would preferably like to do this without some loop.

CodePudding user response：

We can use where rank on axis=1. rank with method='min' and ascending=False will establish an ordering within the row such that the smallest value is 1 and the largest is 5 (the total length of the row). We then use where to replace all values with rank less than 3:

df = df.where(df.rank(axis=1, method='min', ascending=False) < 3, 0)

We can also use the opposite condition with mask to keep values that rank higher than 3 and replace those which are 3 or lower with 0:

df = df.mask(df.rank(axis=1, method='min', ascending=False) >= 3, 0)

Either option produces df:

     0    1     2     3     4
0  0.0  0.0  0.24  0.52  0.00
1  0.0  0.0  0.00  0.52  0.52

*Note depending on desired behaviour we may also want method='dense' or method='first' which will change how duplicated values are handled in the ranking.

Setup:

import pandas as pd

df = pd.DataFrame({
    0: [0.21, 0.31],
    1: [0.11, 0.01],
    2: [0.24, 0.44],
    3: [0.52, 0.52],
    4: [0.12, 0.52]
})

CodePudding user response：

You can try:

A - Use list(df["col"].unique()) and sort/sorted to get the first three values. Put it into a list.

B - Use df.loc to remove the rows with a value within this new list (something like df.loc[df["col"].isin(a)] )