How to find a row in dataframe whose values in 2 columns are closest to my own values in R?-CodePudding

For example, I have this dataframe:

ID	height	price
1	10	12
2	13	7
3	4	33
4	10	15
5	8	49
6	4	2
7	5	11

And I have my own values

height = 11
price = 14

I want to locate the row where ID is 4 because its height and price are closest to my own values. How am I supposed to achieve this in R? I've been trying some dplyr functions but got no luck so far.

CodePudding user response：

Another possible solution:

library(tidyverse)

h = 11
p = 14

df <- data.frame(
  ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L),
  height = c(10L, 13L, 4L, 10L, 8L, 4L, 5L),
  price = c(12L, 7L, 33L, 15L, 49L, 2L, 11L)
)

df %>% 
  mutate(dist = sqrt((height-h)^2   (price-p)^2)) %>% 
  slice_min(dist) %>% 
  select(ID)

#>   ID
#> 1  4

CodePudding user response：

Assuming you want to use the Euclidean distance (I am using the squared distance since it just for sorting purpose), here is a quick way to do it.

df |>
  mutate(dist = (11 - height)^2   (14 - price) ^2) |>
  filter(dist == min(dist))

##>   ID height price dist
##> 1  4     10    15    2

CodePudding user response：

This function chooses the first minimum euclidian distance from the data to the given point.

dat <- read.table(text = "
ID  height  price
1   10  12
2   13  7
3   4   33
4   10  15
5   8   49
6   4   2
7   5   11
", header = TRUE)

choice <- function(x, height, price){
  d <- function(x, y) sqrt(sum((x - y)^2))
  y <- apply(x[-1], 1, d, y = c(height, price))
  which.min(y)
}
choice(dat, 11, 14)
#> [1] 4

^{Created on 2022-03-23 by the reprex package (v2.0.1)}