Home > Enterprise >  How to use nsmallest with conditional
How to use nsmallest with conditional

Time:04-11

I want to use nsmallest between a range in my data frame. I am trying something like this but it doesn't work at all.

import pandas as pd
df = pd.DataFrame({'day': ['1','1','2','3','3','3'],'price': ['5','4','3','2','6','8'],'income': [20,30,40,20,40,50]})
df.loc[(df['day']>1) & (df['day']<=2) & (df.nsmallest(1,'price').index), 'income'] = 10
print(df.head())

The final outcome should look like:

day price income
1 5 20
1 4 30
2 1 10
2 4 20
3 6 40
3 8 50

So in my mind I want to retrieve the 3 nsmallest values only when the column day is equal to 2. and then make the column income in those indexes equal to 10.

CodePudding user response:

I suspect that your sample actually should be

df = pd.DataFrame(
    {'day': [1, 1, 2, 3, 3 , 3], 'price': [5, 4, 3, 2, 6, 8], 'income': [20, 30, 40, 20, 40, 50]}
)

i.e. all integers, and not digit strings? (.nsmallest() doesn't work on strings.) If so, you could do

df.loc[df[df["day"] == 2].nsmallest(1, "price").index, "income"] = 10

to get

   day  price  income
0    1      5      20
1    1      4      30
2    2      3      10
3    3      2      20
4    3      6      40
5    3      8      50
  • Related