Optimize random with filter and map function-CodePudding

I want to randomly retrieve a list of cars mpg based on some predefined fuel type. Here is the code that works but slows down the processing. Is there a better way to apply this principle in a data volume containing a million rows?

list_carbs <- c(1,3,4,4)

get_sample_cars <- function (list_carbs){
  filtered_cars <- map(list_carbs, ~mtcars %>% filter(carb ==.x))

  res <- map(filtered_cars, ~sample(.x$mpg, size=1))
}

mpg_cars <- get_sample_cars(list_carbs)

here are two examples of expected list results:

mpg    carb
27.3    1
16.4    3
19.2    4
10.4    4

mpg    carb
32.4    1
17.3    3
19.2    4
14.7    4

CodePudding user response：

you can probably simplify your code just using this:

mpg_cars <- sample(mtcars$mpg[carb %in% list_carb], size = 3)

that is to say, you can filter your desired column by slicing data in any way you want and sample from the remaining filtered data.

CodePudding user response：

filter(mtcars, carb %in% list_carbs) %>%
   group_by(carb) %>%
   slice_sample(n = 1)

# A tibble: 3 x 11
# Groups:   carb [3]
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
2  16.4     8 276.    180  3.07  4.07  17.4     0     0     3     3
3  13.3     8 350     245  3.73  3.84  15.4     0     0     3     4

EDIT:

mtcars %>%
  select(carb, mpg) %>%
  nest_by(carb) %>%
  filter(carb %in% list_carbs) %>%    
  mutate(data = map2(data, table(list_carbs)[as.character(carb)], 
                         ~sample(.x,.y)))%>%
  unnest(data)

# A tibble: 4 x 2
# Groups:   carb [3]
   carb  data
  <dbl> <dbl>
1     1  22.8
2     3  16.4
3     4  14.3
4     4  14.7