I have a data with the variable id
as a unique origin point, and the variable distance_km
that tells us the distance to the nearest point of interest. I need to a better way classify each id
as being within x km of the point of interest. I can do this manually using case_when
, but I was wondering if there is a better way where I can simply initialize the desired values at the start.
The last with_x_km
has to be multiple of 10. So in my test example, this is 50.
# sample data
data <- tribble(
~id, ~distance_km,
"1", 0.5,
"2", 1.5,
"3", 10.5,
"4", 43,
"5", 20.7
)
max(data$distance_km)
# so the max value should round up to 50
# if this was 51, we would choose 60
# manually using case_when
final <- data %>%
mutate(within_x_km =
case_when(distance_km < 1 ~ "1 km",
distance_km < 10 ~ "10 km",
distance_km < 20 ~ "20 km",
# max value we determined earlier
distance_km < 50 ~ "50 km",
TRUE ~ "NA"))
I am looking for a more efficient way to do this. For example, maybe there is a way to create the bins in advance and automatically categorize the numeric variable. Something like the following:
# the last value should always round up to a multiple of 10
within_km <- c(1, 10, 20, 50)
Thanks
CodePudding user response:
We could use cut
function:
library(dplyr)
labels <- c("1 km", "10 km", "20 km", "50 km")
data %>%
mutate(within_km = cut(distance_km,
breaks = c(0, 1, 10, 20, 50),
labels = labels))
id distance_km within_km
<chr> <dbl> <fct>
1 1 0.5 1 km
2 2 1.5 10 km
3 3 10.5 20 km
4 4 43 50 km
5 5 20.7 50 km
CodePudding user response:
Another possibility, which captures the closest higher values among vec
.
vec <- c(1, 10, 20, 50)
sapply(data$distance_km, \(x) paste(vec[which(vec - x == min((vec - x)[(vec- x) > 0]))], "km"))
# [1] "1 km" "10 km" "20 km" "50 km" "50 km"