I have the following code that generate the unique permutation:
library(magrittr)
library(tictoc)
count_unique_perm <- function(l = NULL) {
lo <- combinat::permn(l)
do.call(rbind, lapply(lo, paste0, collapse = ""))[, 1] %>%
unique() %>%
length()
}
It already give the correct result. With this input:
l1 <- c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P") # 720
l2 <- c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S") # 30,240
But it's running extremely slow.
tic()
count_unique_perm(l = l1)
toc()
#118.155 sec elapsed
#107.793 sec elapsed for l2
How can I speed it up?
CodePudding user response:
You don't need to generate permutations, there is closed formula. You can use package iterpc
:
iterpc::multichoose(table(l1))
or in base:
factorial(length(l2)) / prod(factorial(table(l2)))
CodePudding user response:
Try the RcppAlgos
package, which will return permutations of multisets by using the freqs
argument.
library(RcppAlgos)
library(microbenchmark)
# get a matrix of unique permutations
x <- table(c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P"))
y <- table(c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S"))
microbenchmark(permx = permuteGeneral(names(x), freqs = x),
permy = permuteGeneral(names(y), freqs = y))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> permx 32.3 38.0 44.018 42.05 47.95 64.8 100
#> permy 1538.8 1567.7 1751.259 1606.60 1649.35 5082.5 100
dim(permuteGeneral(names(x), freqs = x))
#> [1] 720 10
dim(permuteGeneral(names(y), freqs = y))
#> [1] 30240 10
To get just the number of unique permutations, use permuteCount
.
microbenchmark(permx = permuteCount(names(x), freqs = x),
permy = permuteCount(names(y), freqs = y))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> permx 1.5 1.6 1.791 1.6 1.8 6.6 100
#> permy 1.5 1.6 2.260 1.7 1.8 46.2 100
permuteCount(names(x), freqs = x)
#> [1] 720
permuteCount(names(y), freqs = y)
#> [1] 30240