Home > OS >  How to efficiently generate unique permutation with R
How to efficiently generate unique permutation with R

Time:11-09

I have the following code that generate the unique permutation:

library(magrittr)
library(tictoc)

count_unique_perm <- function(l = NULL) {
  lo <- combinat::permn(l)
  do.call(rbind, lapply(lo, paste0, collapse = ""))[, 1] %>%
    unique() %>%
    length()
}

It already give the correct result. With this input:

l1 <- c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P") # 720
l2 <- c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S") # 30,240

But it's running extremely slow.

tic()
count_unique_perm(l = l1)
toc()
#118.155 sec elapsed

#107.793 sec elapsed for l2

How can I speed it up?

CodePudding user response:

You don't need to generate permutations, there is closed formula. You can use package iterpc:

iterpc::multichoose(table(l1))

or in base:

factorial(length(l2)) / prod(factorial(table(l2)))

CodePudding user response:

Try the RcppAlgos package, which will return permutations of multisets by using the freqs argument.

library(RcppAlgos)
library(microbenchmark)

# get a matrix of unique permutations
x <- table(c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P"))
y <- table(c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S"))

microbenchmark(permx = permuteGeneral(names(x), freqs = x),
               permy = permuteGeneral(names(y), freqs = y))
#> Unit: microseconds
#>   expr    min     lq     mean  median      uq    max neval
#>  permx   32.3   38.0   44.018   42.05   47.95   64.8   100
#>  permy 1538.8 1567.7 1751.259 1606.60 1649.35 5082.5   100
dim(permuteGeneral(names(x), freqs = x))
#> [1] 720  10
dim(permuteGeneral(names(y), freqs = y))
#> [1] 30240    10

To get just the number of unique permutations, use permuteCount.

microbenchmark(permx = permuteCount(names(x), freqs = x),
               permy = permuteCount(names(y), freqs = y))
#> Unit: microseconds
#>   expr min  lq  mean median  uq  max neval
#>  permx 1.5 1.6 1.791    1.6 1.8  6.6   100
#>  permy 1.5 1.6 2.260    1.7 1.8 46.2   100
permuteCount(names(x), freqs = x)
#> [1] 720
permuteCount(names(y), freqs = y)
#> [1] 30240
  • Related