Home > other >  R categorize numeric value using case_when
R categorize numeric value using case_when

Time:04-08

I have a data with the variable id as a unique origin point, and the variable distance_km that tells us the distance to the nearest point of interest. I need to a better way classify each id as being within x km of the point of interest. I can do this manually using case_when, but I was wondering if there is a better way where I can simply initialize the desired values at the start.

The last with_x_km has to be multiple of 10. So in my test example, this is 50.

# sample data
data <- tribble(
  ~id, ~distance_km,
  "1",   0.5, 
  "2",   1.5, 
  "3",   10.5, 
  "4", 43, 
  "5", 20.7
  
)

max(data$distance_km)

# so the max value should round up to 50
# if this was 51, we would choose 60

# manually using case_when

final <- data %>% 
  mutate(within_x_km = 
           case_when(distance_km < 1 ~ "1 km",
                     distance_km < 10 ~ "10 km",
                     distance_km < 20 ~ "20 km",
                     # max value we determined earlier
                     distance_km < 50 ~ "50 km",
                     TRUE ~ "NA"))

I am looking for a more efficient way to do this. For example, maybe there is a way to create the bins in advance and automatically categorize the numeric variable. Something like the following:

# the last value should always round up to a multiple of 10
within_km <- c(1, 10, 20, 50)

Thanks

CodePudding user response:

We could use cut function:

library(dplyr)

labels <- c("1 km", "10 km", "20 km", "50 km")

data %>% 
  mutate(within_km =  cut(distance_km, 
                          breaks = c(0, 1, 10, 20, 50), 
                          labels = labels))
  id    distance_km within_km
  <chr>       <dbl> <fct>    
1 1             0.5 1 km     
2 2             1.5 10 km    
3 3            10.5 20 km    
4 4            43   50 km    
5 5            20.7 50 km 

CodePudding user response:

Another possibility, which captures the closest higher values among vec.

vec <- c(1, 10, 20, 50)
sapply(data$distance_km, \(x) paste(vec[which(vec - x == min((vec - x)[(vec- x) > 0]))], "km"))
# [1] "1 km"  "10 km" "20 km" "50 km" "50 km"
  • Related