So I have a vector of numbers, say 1:8000. I want generate exactly n
distinct permutations of this vector. What is a way to do this without having to calculate all permutations (since 8000! is huge and can't fit in RAM).
function distinct_permutations(L, N){
# Return N distinct permutations of L as a list of lists
return(x)
}
x <- seq(1:8000)
CodePudding user response:
Using RcppAlgos::permuteSample
, here demonstrated with n=3 samples of 10 out of 10.
set.seed(42)
RcppAlgos::permuteSample(v=10, m=10, n=3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 9 8 5 3 4 1 10 6 2 7
# [2,] 1 3 5 9 7 6 8 10 2 4
# [3,] 2 8 5 9 4 7 3 10 6 1
CodePudding user response:
I will implement my first idea from 3 proposed in comments and compared with the already provided by other User. My solution is more universal as any set of values could be provided, I provide a list of vectors and my solution is faster for many scenarios.
perm_n <- function(vec, n) {
# I sample 2 * N samples as a risk of duplicates for bigger samples
unique(lapply(seq_len(n*2), function(x) sample(vec, length(vec))))[1:n]
}
microbenchmark::microbenchmark(
RcppAlgos::permuteSample(v=10, m=10, n=10),
perm_n(1:10, 10)
)
#> Unit: microseconds
#> expr min lq mean
#> RcppAlgos::permuteSample(v = 10, m = 10, n = 10) 1028.510 1346.6275 3644.6420
#> perm_n(1:10, 10) 147.064 161.0525 352.9444
#> median uq max neval
#> 1825.1380 5253.6080 64103.04 100
#> 174.5465 208.9115 16801.58 100
microbenchmark::microbenchmark(
RcppAlgos::permuteSample(v=8000, m=8000, n=10),
perm_n(1:8000, 10),
times = 10
)
#> Unit: milliseconds
#> expr min lq
#> RcppAlgos::permuteSample(v = 8000, m = 8000, n = 10) 205.790296 206.467661
#> perm_n(1:8000, 10) 6.479791 6.498455
#> mean median uq max neval
#> 207.490482 207.208447 208.037616 210.12633 10
#> 6.890623 6.571103 6.695908 8.42595 10
Created on 2022-11-23 with reprex v2.0.2