Home > Back-end >  Inconsistent behaviour between sample_n() and slice_sample()
Inconsistent behaviour between sample_n() and slice_sample()

Time:01-03

I have come across a simple but tricky question when trying to use slice_sample() to replace its predecessor sample_n() in a map() function. I am trying to replicate an example^ which samples the mtcar dataset with 1, 2, and 3 rows.

Run example code with sample_n():

map(c(1, 2, 3), sample_n, tbl = mtcars)

I get:

[[1]]
          mpg cyl disp hp drat  wt  qsec vs am gear carb
Fiat 128 32.4   4 78.7 66 4.08 2.2 19.47  1  1    4    1

[[2]]
                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
Cadillac Fleetwood 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
Mazda RX4 Wag      21.0   6  160 110 3.90 2.875 17.02  0  1    4    4

[[3]]
               mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Merc 280      19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C     17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1

But when I try the slice_sample() function:

map(c(1, 2, 3), slice_sample, .data = mtcars)

I get:

Error in `map()`:
ℹ In index: 1.
Caused by error in `.f()`:
! `n` must be explicitly named.
ℹ Did you mean `slice_sample(n = 1)`?
Run `rlang::last_error()` to see where the error occurred.

sample_n() and sample_frac() have been superseded in favour of slice_sample().
... These functions were superseded because we realised it was more convenient to have two mutually exclusive arguments to one function, rather than two separate functions.

I have read through both help pages and did a series of experiments but didn't go very far. Deeper down my heart I knew it's definitely possible - could anyone give me a hint?


^: Page 217, Chapter 8, Beyond Spreadsheets with R: A beginner's guide to R and Rstudio

CodePudding user response:

This is because

! n must be explicitly named.

In slice_sample, you have to specify either n or prop, otherwise it'll throw an error, like here. In your case, you can use an anonymous function to get the expected output:

map(c(1, 2, 3), ~ slice_sample(n = .x, mtcars))

In general, it is more appropriate to use anonymous functions rather than ... in map functions. As mentioned in purrr documentation, it can avoid confusing situations:

We also recommend using an anonymous function instead of passing additional arguments to map. This avoids a certain class of moderately esoteric argument matching woes and, we believe, is generally easier to read.

  • Related