I want to randomly retrieve a list of cars mpg based on some predefined fuel type. Here is the code that works but slows down the processing. Is there a better way to apply this principle in a data volume containing a million rows?
list_carbs <- c(1,3,4,4)
get_sample_cars <- function (list_carbs){
filtered_cars <- map(list_carbs, ~mtcars %>% filter(carb ==.x))
res <- map(filtered_cars, ~sample(.x$mpg, size=1))
}
mpg_cars <- get_sample_cars(list_carbs)
here are two examples of expected list results:
mpg carb
27.3 1
16.4 3
19.2 4
10.4 4
mpg carb
32.4 1
17.3 3
19.2 4
14.7 4
CodePudding user response:
you can probably simplify your code just using this:
mpg_cars <- sample(mtcars$mpg[carb %in% list_carb], size = 3)
that is to say, you can filter your desired column by slicing data in any way you want and sample from the remaining filtered data.
CodePudding user response:
filter(mtcars, carb %in% list_carbs) %>%
group_by(carb) %>%
slice_sample(n = 1)
# A tibble: 3 x 11
# Groups: carb [3]
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
2 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
3 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
EDIT:
mtcars %>%
select(carb, mpg) %>%
nest_by(carb) %>%
filter(carb %in% list_carbs) %>%
mutate(data = map2(data, table(list_carbs)[as.character(carb)],
~sample(.x,.y)))%>%
unnest(data)
# A tibble: 4 x 2
# Groups: carb [3]
carb data
<dbl> <dbl>
1 1 22.8
2 3 16.4
3 4 14.3
4 4 14.7