I would like to transform the values of a Pandas Dataframe so that the 3 smallest columns for instance is set to zero:
row1: 0.21, 0.11, 0.24, 0.52, 0.12
row2: 0.31, 0.01, 0.44, 0.52, 0.52
Would become:
row1: 0.0, 0.0, 0.24, 0.52, 0.0
row2: 0.0, 0.0. 0.0, 0.52, 0.52
I would preferably like to do this without some loop.
CodePudding user response:
We can use where
rank
on axis=1
. rank
with method='min'
and ascending=False
will establish an ordering within the row such that the smallest value is 1 and the largest is 5 (the total length of the row). We then use where
to replace all values with rank less than 3:
df = df.where(df.rank(axis=1, method='min', ascending=False) < 3, 0)
We can also use the opposite condition with mask
to keep values that rank higher than 3 and replace those which are 3 or lower with 0:
df = df.mask(df.rank(axis=1, method='min', ascending=False) >= 3, 0)
Either option produces df
:
0 1 2 3 4
0 0.0 0.0 0.24 0.52 0.00
1 0.0 0.0 0.00 0.52 0.52
*Note depending on desired behaviour we may also want method='dense'
or method='first'
which will change how duplicated values are handled in the ranking.
Setup:
import pandas as pd
df = pd.DataFrame({
0: [0.21, 0.31],
1: [0.11, 0.01],
2: [0.24, 0.44],
3: [0.52, 0.52],
4: [0.12, 0.52]
})
CodePudding user response:
You can try:
A - Use list(df["col"].unique()) and sort/sorted to get the first three values. Put it into a list.
B - Use df.loc to remove the rows with a value within this new list (something like df.loc[df["col"].isin(a)] )