Home > Back-end >  Determining All Possible Combinations of Items With a Grouping Variable
Determining All Possible Combinations of Items With a Grouping Variable

Time:06-10

This question is very similar to this one, but instead of putting one item from each original group into each output group, I want multiple items from each original group in each output group.

I have the following data.

list1 <- list(Group_1 = c("1", "2", "3", "4", "5", "6"), Group_2 = c("13", "14", "15", "16", "17", "18"), Group_3 = c("19", "20", "21", "22", "23", "24", "25"))
Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_Output_Group <- 2

Group_1 contains 6 items, Group_2 contains 6 items, and Group_3 contains 7 items. Based on these initial groups, I want to place items into 3 new groups, making sure that each new group contains the same number (2, or Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_Output_Group) of items in it from each original group. Furthermore, none of the items can be repeated in each of these 3 new groups - each item can only be used once for each of the new groups. For example, one possible output could be this output below.

(list(New_Group_1 = c("1", "2", "13", "14", "19", "20"), New_Group_2 = c("3", "4", "15", "16", "21", "22"), New_Group_3 = c("5", "6", "17", "18", "23", "24")))

I'd actually like to generate a list of output that shows every possible output (every possible combination). For each list element, there should be three new groups, and each group should have distinct items in it. For example, in addition to the solution above, another one could be the following.

(list(New_Group_1 = c("3", "4", "13", "14", "19", "20"), New_Group_2 = c("5", "6", "15", "16", "21", "22"), New_Group_3 = c("1", "2", "17", "18", "23", "24")))

The question I asked here is very similar, except in that question, I only put one item from each original group into each new group. Here, I'd like to have an option of putting more than 1 item from each original group into each new group.

CodePudding user response:

This is fairly tricky, and quite slow, since there are 23,625 possible combinations of the three groups as specified

n <- 2

all_pairs <- lapply(list1, function(group) {
  all_combs <- apply(combn(ncol(combn(length(group), n)), length(list1)), 
                     2,
        function(x) c(combn(length(group), n)[,x]), simplify = FALSE)
  
  all_combs[sapply(all_combs, function(x) !any(duplicated(x)))]
})

index_df <- do.call(expand.grid, lapply(all_pairs, function(x) seq(length(x))))

result <- apply(index_df, 1, function(x) {
  m <- t(sapply(seq_along(x), function(y) list1[[y]][all_pairs[[y]][[x[y]]]]))
  setNames(lapply(split(m, (seq_along(m)-1) %/% (length(list1) * n)),
         function(x) c(t(matrix(x, ncol = n)))), names(list1))
}, simplify = FALSE)

The resulting list looks as follows:

head(result)
#> [[1]]
#> [[1]]$Group_1
#> [1] "1"  "2"  "13" "14" "19" "20"
#> 
#> [[1]]$Group_2
#> [1] "3"  "4"  "15" "16" "21" "22"
#> 
#> [[1]]$Group_3
#> [1] "5"  "6"  "17" "18" "23" "24"
#> 
#> 
#> [[2]]
#> [[2]]$Group_1
#> [1] "1"  "2"  "13" "14" "19" "20"
#> 
#> [[2]]$Group_2
#> [1] "3"  "5"  "15" "16" "21" "22"
#> 
#> [[2]]$Group_3
#> [1] "4"  "6"  "17" "18" "23" "24"
#> 
#> 
#> [[3]]
#> [[3]]$Group_1
#> [1] "1"  "2"  "13" "14" "19" "20"
#> 
#> [[3]]$Group_2
#> [1] "3"  "6"  "15" "16" "21" "22"
#> 
#> [[3]]$Group_3
#> [1] "4"  "5"  "17" "18" "23" "24"
#> 
#> 
#> [[4]]
#> [[4]]$Group_1
#> [1] "1"  "3"  "13" "14" "19" "20"
#> 
#> [[4]]$Group_2
#> [1] "2"  "4"  "15" "16" "21" "22"
#> 
#> [[4]]$Group_3
#> [1] "5"  "6"  "17" "18" "23" "24"
#> 
#> 
#> [[5]]
#> [[5]]$Group_1
#> [1] "1"  "3"  "13" "14" "19" "20"
#> 
#> [[5]]$Group_2
#> [1] "2"  "5"  "15" "16" "21" "22"
#> 
#> [[5]]$Group_3
#> [1] "4"  "6"  "17" "18" "23" "24"
#> 
#> 
#> [[6]]
#> [[6]]$Group_1
#> [1] "1"  "3"  "13" "14" "19" "20"
#> 
#> [[6]]$Group_2
#> [1] "2"  "6"  "15" "16" "21" "22"
#> 
#> [[6]]$Group_3
#> [1] "4"  "5"  "17" "18" "23" "24"

Created on 2022-06-09 by the reprex package (v2.0.1)

  • Related