I want to use nsmallest between a range in my data frame. I am trying something like this but it doesn't work at all.
import pandas as pd
df = pd.DataFrame({'day': ['1','1','2','3','3','3'],'price': ['5','4','3','2','6','8'],'income': [20,30,40,20,40,50]})
df.loc[(df['day']>1) & (df['day']<=2) & (df.nsmallest(1,'price').index), 'income'] = 10
print(df.head())
The final outcome should look like:
day | price | income |
---|---|---|
1 | 5 | 20 |
1 | 4 | 30 |
2 | 1 | 10 |
2 | 4 | 20 |
3 | 6 | 40 |
3 | 8 | 50 |
So in my mind I want to retrieve the 3 nsmallest values only when the column day is equal to 2. and then make the column income in those indexes equal to 10.
CodePudding user response:
I suspect that your sample actually should be
df = pd.DataFrame(
{'day': [1, 1, 2, 3, 3 , 3], 'price': [5, 4, 3, 2, 6, 8], 'income': [20, 30, 40, 20, 40, 50]}
)
i.e. all integers, and not digit strings? (.nsmallest()
doesn't work on strings.) If so, you could do
df.loc[df[df["day"] == 2].nsmallest(1, "price").index, "income"] = 10
to get
day price income
0 1 5 20
1 1 4 30
2 2 3 10
3 3 2 20
4 3 6 40
5 3 8 50