I have a data frame and character vector
dat = data.frame(a = c(1,0,3), b = c(9, 8, 7))
vec = c("A", "B", "C", "D", "E")
and I am trying to group_by on columns a, b and then use summarise to create another nested column c, which are samples of size b-a from vec. If I need say 3 samples per (a, b) group, I can do this by:
df = dat %>%
group_by_all() %>%
summarise(c = replicate(3, list(mapply(function(x, y) sample(vec, size = x - y, replace = TRUE), b, a))))
which gives me what I want, but I don't like how the data looks. That is, the entries of column c are <chr [8 x 1]> and so they look something like
[[1]]
[,1]
[1,] "B"
[2,] "D"
[3,] "A"
[4,] "C"
[5,] "A"
[6,] "E"
[7,] "B"
[8,] "C"
so that when I unnest c, by running
df %>% unnest(c)
The name of column c changes to c[,1]. Which I could rename, however I feel that this is all a bit messy. Is there any way I could have got my data frame df to output entries which look like the following?
[[1]]
[1] "B" "D" "A" "C" "A" "E" "B" "C"
CodePudding user response:
Because you've already grouped by all variables, there's only one a
and b
value per group, so you actually don't need mapply
in this case. The reason your column name looks wrong after unnesting is that you've inadvertently created a list of 1-column matrices, rather than vectors, and the [,1]
that's appearing on your column name is telling you that you're seeing the first column of the data that was in the previously-nested column c
.
I think this does what you're looking for:
library(tidyverse)
dat <- data.frame(a = c(1,0,3), b = c(9, 8, 7))
vec <- c("A", "B", "C", "D", "E")
n_rep <- 3 # (samples/group)
result <-
dat %>%
group_by(across(everything())) %>%
summarise(
c = replicate(n_rep, sample(vec, b - a, replace = TRUE), simplify = FALSE),
.groups = "drop"
)
result
#> # A tibble: 9 x 3
#> a b c
#> <dbl> <dbl> <list>
#> 1 0 8 <chr [8]>
#> 2 0 8 <chr [8]>
#> 3 0 8 <chr [8]>
#> 4 1 9 <chr [8]>
#> 5 1 9 <chr [8]>
#> 6 1 9 <chr [8]>
#> 7 3 7 <chr [4]>
#> 8 3 7 <chr [4]>
#> 9 3 7 <chr [4]>
# Check one of the elements of result$c:
result$c[[1]]
#> [1] "E" "C" "A" "A" "A" "E" "B" "C"