Home > Net >  Group_by and mapply on sample returning 2d array instead of list
Group_by and mapply on sample returning 2d array instead of list

Time:08-05

I have a data frame and character vector

dat = data.frame(a = c(1,0,3), b = c(9, 8, 7))
vec = c("A", "B", "C", "D", "E")

and I am trying to group_by on columns a, b and then use summarise to create another nested column c, which are samples of size b-a from vec. If I need say 3 samples per (a, b) group, I can do this by:

df = dat %>%
group_by_all() %>%
summarise(c = replicate(3, list(mapply(function(x, y) sample(vec, size = x - y, replace = TRUE), b, a))))

which gives me what I want, but I don't like how the data looks. That is, the entries of column c are <chr [8 x 1]> and so they look something like

[[1]]
     [,1]
[1,] "B" 
[2,] "D" 
[3,] "A" 
[4,] "C" 
[5,] "A" 
[6,] "E" 
[7,] "B" 
[8,] "C"

so that when I unnest c, by running

df %>% unnest(c)

The name of column c changes to c[,1]. Which I could rename, however I feel that this is all a bit messy. Is there any way I could have got my data frame df to output entries which look like the following?

[[1]]
[1] "B" "D" "A" "C" "A" "E" "B" "C"

CodePudding user response:

Because you've already grouped by all variables, there's only one a and b value per group, so you actually don't need mapply in this case. The reason your column name looks wrong after unnesting is that you've inadvertently created a list of 1-column matrices, rather than vectors, and the [,1] that's appearing on your column name is telling you that you're seeing the first column of the data that was in the previously-nested column c.

I think this does what you're looking for:

library(tidyverse)

dat <- data.frame(a = c(1,0,3), b = c(9, 8, 7))
vec <- c("A", "B", "C", "D", "E")
n_rep <- 3 # (samples/group)

result <-
  dat %>% 
  group_by(across(everything())) %>% 
  summarise(
    c = replicate(n_rep, sample(vec, b - a, replace = TRUE), simplify = FALSE),
    .groups = "drop"
  )

result
#> # A tibble: 9 x 3
#>       a     b c        
#>   <dbl> <dbl> <list>   
#> 1     0     8 <chr [8]>
#> 2     0     8 <chr [8]>
#> 3     0     8 <chr [8]>
#> 4     1     9 <chr [8]>
#> 5     1     9 <chr [8]>
#> 6     1     9 <chr [8]>
#> 7     3     7 <chr [4]>
#> 8     3     7 <chr [4]>
#> 9     3     7 <chr [4]>

# Check one of the elements of result$c:
result$c[[1]]
#> [1] "E" "C" "A" "A" "A" "E" "B" "C"
  • Related