Home > Enterprise >  subsample random rows of tibble
subsample random rows of tibble

Time:10-17

Suppose i have two data objects, df.A and df.B.

df.A <- structure(list(Species = structure(c(7L, 7L, 1L, 1L, 1L, 1L, 
4L, 6L, 5L, 5L), .Label = c("Carcharhinus leucas", "Carcharhinus limbatus", 
"Carcharhinus perezi", "Galeocerdo cuvier", "Ginglymostoma cirratum", 
"Hypanus americanus", "Negaprion brevirostris", "Sphyrna mokarran"
), class = "factor"), Sex = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

> class(df.A)
[1] "data.frame"


df.B <- structure(list(Diel.phase = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 
2L, 1L, 1L, 1L), .Label = c("Day", "Night"), class = "factor"), 
    Season = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 
    2L), .Label = c("Summer", "Winter"), class = "factor")), row.names = c(NA, 
-10L), groups = structure(list(.rows = structure(list(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"))

> class(df.B)
[1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"

Let's say I want to subsample 2 rows from each object. The code below works for df.A but not for df.B. Instead, all rows for df.B are returned.

  df.B %>% slice_sample(n=2)

Can someone explain this result? And how can i apply sample_slice to object of class(df.B) without back-transforming to data.frame object first?

CodePudding user response:

The grouping influences how the tibble is treated.
You can do this:

df.B %>% ungroup() %>%  slice_sample(n=2)
  • Related