I have a dataframe that looks like this
Segment Distance ID
3 0.348 1203
8 0.439 1204
4 0.458 1205
8 0.284 1207
3 0.359 1208
8 0.563 1209
4 0.388 1210
I want to be able to enter id = '1203'
in python
and get 3 of the nearest observations according to the distance.
So when I enter id = '1203'
, this is my desired output
Segment Distance ID
3 0.359 1208
4 0.388 1210
8 0.284 1207
I only want my output to be based on the distance variable, not segment, but I want the respective segment number to show up in my output.
I know it's a simple problem, but I am fairly new to python, so I am a little confused on how to approach this. Can someone help me out? Thanks.
CodePudding user response:
Extract the distance from ID '1203' then subtract its distance to the whole dataframe and sort values. Drop the first row (ID '1203') and keep the next three ones:
dist = df.loc[df['ID'] == '1203', 'Distance'][0]
out = df.loc[df['Distance'].sub(dist).abs().sort_values().tail(-1).head(3).index]
print(out)
# Output:
Segment Distance ID
4 3 0.359 1208
6 4 0.388 1210
3 8 0.284 1207
CodePudding user response:
I think a good approach is to sort by 'Distance' col. then get the next three rows with loc function
df.sort_values(by=['Distance'])
df.loc[:'1203'].head(3)