Home > Blockchain >  Generate N distinct permutations of a very large list in R?
Generate N distinct permutations of a very large list in R?

Time:11-23

So I have a vector of numbers, say 1:8000. I want generate exactly n distinct permutations of this vector. What is a way to do this without having to calculate all permutations (since 8000! is huge and can't fit in RAM).

function distinct_permutations(L, N){
  # Return N distinct permutations of L as a list of lists
  return(x)
}

x <- seq(1:8000)

CodePudding user response:

Using RcppAlgos::permuteSample, here demonstrated with n=3 samples of 10 out of 10.

set.seed(42)
RcppAlgos::permuteSample(v=10, m=10, n=3)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    9    8    5    3    4    1   10    6    2     7
# [2,]    1    3    5    9    7    6    8   10    2     4
# [3,]    2    8    5    9    4    7    3   10    6     1

CodePudding user response:

I will implement my first idea from 3 proposed in comments and compared with the already provided by other User. My solution is more universal as any set of values could be provided, I provide a list of vectors and my solution is faster for many scenarios.

perm_n <- function(vec, n) {
  # I sample 2 * N samples as a risk of duplicates for bigger samples
  unique(lapply(seq_len(n*2), function(x) sample(vec, length(vec))))[1:n]
}

microbenchmark::microbenchmark(
  RcppAlgos::permuteSample(v=10, m=10, n=10),
  perm_n(1:10, 10)
)
#> Unit: microseconds
#>                                              expr      min        lq      mean
#>  RcppAlgos::permuteSample(v = 10, m = 10, n = 10) 1028.510 1346.6275 3644.6420
#>                                  perm_n(1:10, 10)  147.064  161.0525  352.9444
#>     median        uq      max neval
#>  1825.1380 5253.6080 64103.04   100
#>   174.5465  208.9115 16801.58   100

microbenchmark::microbenchmark(
  RcppAlgos::permuteSample(v=8000, m=8000, n=10),
  perm_n(1:8000, 10),
  times = 10
)
#> Unit: milliseconds
#>                                                  expr        min         lq
#>  RcppAlgos::permuteSample(v = 8000, m = 8000, n = 10) 205.790296 206.467661
#>                                    perm_n(1:8000, 10)   6.479791   6.498455
#>        mean     median         uq       max neval
#>  207.490482 207.208447 208.037616 210.12633    10
#>    6.890623   6.571103   6.695908   8.42595    10

Created on 2022-11-23 with reprex v2.0.2

  • Related