Home > OS >  R expand.grid for repeated combinations of a vector in groups?
R expand.grid for repeated combinations of a vector in groups?

Time:10-23

Been working on this for two days, and don't see any progress. Say I have 20 numbers, and I want to (without replacement)

  • get one unique group of 10 numbers
  • get two unique groups of 3 numbers
  • get two unique groups of 2 numbers

There are (20 choose 10) * (10 choose 3) * (7 choose 3) * (4 choose 2) * (2 choose 2) = 4,655,851,200 total groups.

I'd like either a nested list or a data.frame that I can evaluate these groups over. Is that possible to do quickly in R?

Below I have a pretty unwieldy solution for the case when I'm just trying to get one unique group of 5, one unique group of 3, and one unique group of 2, from 10 numbers.

This code actually works, but quickly becomes untenable for larger groups. I'm wondering if there's a fast way to create this grid? It seems like it should be easily possible, but I can't figure out how. Thanks in advance!!


library(data.table)
library(gtools)

n <- 10
group_sizes <- c(5,3,2)

# initialize first group
group_3_values <-
  combinations(
    n = n,
    r = group_sizes[1], # size of the first group!
    repeats.allowed = T,
    v = 1:n
  )
group_3_values <- as.data.table(group_3_values)
group_3_values[, row := 1:nrow(group_3_values)]
output <- as.data.table(group_3_values)

# run for loop for the remaining groups
start_time <- Sys.time()
for (group_size in group_sizes[-1]) { # size of the second, etc. groups
  output_list <- list()
  for (i in 1:nrow(output)) {
    remaining_options <-
      setdiff(1:n, output[row == i, .SD, .SDcols = !c('row')])
    second_group <- combinations(
      n = length(remaining_options),
      r = group_size,
      repeats.allowed = F,
      v = remaining_options
    )
    second_group <- as.data.table(second_group)
    second_group[, row := i]
    out <-
      merge(
        output[row == i,],
        second_group,
        by = "row",
        all.x = T,
        all.y = T,
        allow.cartesian = T
      )
    out$row <- NULL
    output_list[[i]] <- as.data.table(out)
    out <- NULL
  }
  output <- as.data.table(rbindlist(output_list))
  output[, row := 1:nrow(output)]
}
end_time <- Sys.time()
duration <- end_time - start_time
print(duration)

CodePudding user response:

I think materializing a table or even a vector of 4,655,851,200 elements is not a good choice. Perhaps the best you can do is to use combn to generate each combination, with a callback to run your desired code.

This example generates and prints the elements for every combination and increases a counter n in the global scope. I use callCC for early exit at 10 iterations .

x <- 1:20

n<-0
callCC(function(exit)
  combn(x, 10, function(i)
    combn(setdiff(x, c(i)), 3, function(j)
      combn(setdiff(x, c(i,j)), 3, function(k)
        combn(setdiff(x, c(i,j,k)), 2, function(l)
          combn(setdiff(x, c(i,j,k,l)), 2, function(m){
            n<<-n 1
            # YOUR CODE HERE
            print(paste0(sapply(list(i,j,k,l,m), paste0, collapse=" "), collapse=" | "))
            if(n>=10) exit("early exit")
          }, simplify=F),
        simplify=F),
      simplify=F),
    simplify=F),
  simplify=F))

#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 16 | 17 18 | 19 20"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 16 | 17 19 | 18 20"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 16 | 17 20 | 18 19"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 16 | 18 19 | 17 20"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 16 | 18 20 | 17 19"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 16 | 19 20 | 17 18"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 17 | 16 18 | 19 20"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 17 | 16 19 | 18 20"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 17 | 16 20 | 18 19"
#> [1] "1 2 3 4 5 6 7 8 9 10 | 11 12 13 | 14 15 17 | 18 19 | 16 20"
#> [1] "early exit"
  • Related