I want to find all the possible ways a number of categories can be similar to each other in R. The total number of combinations is calculated as Bell's number. E.g. there are 15 ways to organise 4 categories.
I am trying to write some R code that shows those 15 ways as binary vectors.
This first block of code shows there are four classes, each of which can take a the value of a, b, or c.
set.seed(123)
n_classes = 4
classes = LETTERS[1:n_classes]
state = sample(letters[1:3], size = n_classes, replace = TRUE)
# "c" "b" "b" "b"
Comparing each class to each other we can get a structural representation of this system:
comparisons = outer(state, state, "==") %>%
.[lower.tri(.)]
names(comparisons) = outer(classes, classes, paste0) %>%
.[lower.tri(.)]
# > comparisons
# BA CA DA CB DB DC
# TRUE TRUE FALSE TRUE FALSE FALSE
You can see B and A are the same, and C and A are the same, so by deduction C and B are the same and all other comparisons are different.
We know this is one of 15 solutions for four categories - I want to know the other 14 in this binary format.
I first tried using expand.grid, but this doesnt account for the conditional structure of the categories (it just shows all the possibilities for each cell being 1 or 0)
# not correct
poss = rep(list(0:1), length(comparisons))
possibilities = expand.grid(poss)
I feel like there is a simple solution, but I am currently blind to it.
Thanks in advance.
CodePudding user response:
You can use the R package partitions
to get all possible partitions of a set of categories.
library(partitions)
# the set to be partitioned
categories <- letters[1:4]
# the clusters defining the partition
clusters <- LETTERS[1:4]
categories |> length() |> listParts()
#> [[1]]
#> [1] (1,2,3,4)
#>
#> [[2]]
#> [1] (1,2,4)(3)
#>
#> [[3]]
#> [1] (1,2,3)(4)
#>
#> [[4]]
#> [1] (1,3,4)(2)
#>
#> [[5]]
#> [1] (2,3,4)(1)
#>
#> [[6]]
#> [1] (1,4)(2,3)
#>
#> [[7]]
#> [1] (1,2)(3,4)
#>
#> [[8]]
#> [1] (1,3)(2,4)
#>
#> [[9]]
#> [1] (1,4)(2)(3)
#>
#> [[10]]
#> [1] (1,2)(3)(4)
#>
#> [[11]]
#> [1] (1,3)(2)(4)
#>
#> [[12]]
#> [1] (2,4)(1)(3)
#>
#> [[13]]
#> [1] (2,3)(1)(4)
#>
#> [[14]]
#> [1] (3,4)(1)(2)
#>
#> [[15]]
#> [1] (1)(2)(3)(4)
categories |> length() |> setparts()
#>
#> [1,] 1 1 1 1 2 1 1 1 1 1 1 2 2 2 1
#> [2,] 1 1 1 2 1 2 1 2 2 1 2 1 1 3 2
#> [3,] 1 2 1 1 1 2 2 1 3 2 1 3 1 1 3
#> [4,] 1 1 2 1 1 1 2 2 1 3 3 1 3 1 4
categories |>
length() |>
setparts() |>
as.matrix() |>
as.data.frame() |>
lapply(function(x) sapply(x, function(y) clusters[[y]])) |>
as.data.frame()
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#> 1 A A A A B A A A A A A B B B A
#> 2 A A A B A B A B B A B A A C B
#> 3 A B A A A B B A C B A C A A C
#> 4 A A B A A A B B A C C A C A D
Created on 2022-05-16 by the reprex package (v2.0.0)