Home > front end >  slice_sample number of rows
slice_sample number of rows

Time:04-12

This is a general question for the slice_sample process. From my original database I am doing sthg like this


df<-dat_longer %>% dplyr::select(grupo_int_v00, time, peso1 ,cintura1, hdl) %>% 
      group_by(grupo_int_v00) %>% 
      slice_sample(n = 20,replace=TRUE) %>% ungroup() %>% dput()

Therefore, I am getting this code:

df<-structure(list(grupo_int_v00 = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("A", "B"), label = "Grupo de intervención", class = "factor"), 
    time = c(0, 0, 2, 0, 2, 1, 1, 2, 2, 1, 1, 0, 2, 1, 2, 0, 
    1, 2, 1, 0, 0, 2, 2, 1, 0, 2, 2, 1, 0, 2, 1, 0, 1, 0, 1, 
    2, 1, 0, 0, 0), peso1 = c(100.7, 93, 84.5, 110.2, 76.4, 90.7, 
    93.6, 90.2, 84.8, 82.1, 125.3, 80.2, 76, 64.5, 86.9, 99, 
    83.9, 96.1, 91.6, 89.9, 93.4, 98.8, 70, 67.7, 110.3, 75, 
    87.2, 97.9, 82.7, 69.5, 81.2, 98, 73.8, 91.2, 87, 95, 76.6, 
    103.2, 103.4, 60), cintura1 = c(116.5, 112, 107, 127, NA, 
    106, 98.5, 124, 103.5, 107, 133.5, 104.5, 104.5, 97, 104.5, 
    107, 116, 110, 109, 113, 107, 105, 98, 101, 132, NA, 96.5, 
    118, 110, 85, 106.5, 123, 108, 107.5, 112, 117, 97.5, 114, 
    119, 94), hdl = c(56, 47, 61, 54, NA, 80, 61, 76, 50, 71, 
    64, 47, 59, 61, 59, 49, 49, 68, 71, 59, 55, 43, 52, 53, 42, 
    NA, 40, 40, 58, 60, 53, 62, 56, 48, 58, 39, 54, 63, 45, 45
    )), row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"
))

This code is made up 40 rows. But I am specifying 20 as n. I have gone through the arguments function but I don't really understand what is going on

Thanks in advance

CodePudding user response:

This is because you use group_by which means it will return per group 20 samples. Here is an example using iris dataset:

iris %>% 
  group_by(Species) %>%
  slice_sample(n = 5)

Output:

# A tibble: 15 × 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     
 1          4.8         3.4          1.9         0.2 setosa    
 2          5           3.3          1.4         0.2 setosa    
 3          5.2         3.5          1.5         0.2 setosa    
 4          4.5         2.3          1.3         0.3 setosa    
 5          5.1         3.8          1.5         0.3 setosa    
 6          5.6         3            4.5         1.5 versicolor
 7          6.5         2.8          4.6         1.5 versicolor
 8          5.8         2.6          4           1.2 versicolor
 9          5.5         2.4          3.7         1   versicolor
10          6.4         3.2          4.5         1.5 versicolor
11          6.7         3.3          5.7         2.1 virginica 
12          6.7         3            5.2         2.3 virginica 
13          5.7         2.5          5           2   virginica 
14          5.8         2.8          5.1         2.4 virginica 
15          7.2         3.2          6           1.8 virginica 

When using no group_by:

iris %>% 
  slice_sample(n = 5)

Output:

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          5.2         3.4          1.4         0.2     setosa
2          6.6         2.9          4.6         1.3 versicolor
3          7.2         3.6          6.1         2.5  virginica
4          5.5         3.5          1.3         0.2     setosa
5          4.7         3.2          1.6         0.2     setosa

It returns 5 samples.

  • Related