Using lapply to replace values in a list from randomly sampled values from another list-CodePudding

I am trying to replace values in a list word, on indexes specified by the list positions, by sampling values that exist in a third list called letters.

Here's an example of how my lists look like:

word <- c("A","E","C","A","R","O","P")

positions <- c(1,5,3,7)

letters <- c("A","B","C","D","E","F")

One important detail is that the value in word[position] should not remain the same after sampling, which can happen because of overlapping values in letters and word

The current code that I am using to do this is:

for (i in 1:length(positions)){
  temp <- word[[positions[i]]] 
  word[[positions[i]]] <- sample(letters, 1)
  while (word[[positions[i]]] == temp) {
    word[[positions[i]]] <- sample(letters, 1) 
  }
}

While this works, I realize that it's extremely inefficient, as the order in which I change the values in the list doesn't matter. I've been trying to use of of the "apply" family of functions to solve this, but I am having trouble figuring out a solution.

Thank you very much for the attention!

CodePudding user response：

You can do this:

word[positions] <- sapply(word[positions], 
                          \(w) sample(setdiff(letters, w), 1))

Inside sapply you always remove the current word from letters, therefore a different one is guaranteed to be sampled.

Also note that letters is a built-in R constant (containing lowercase english alphabet, see ?letters) so it is generally not a good idea to use this name for user-defined variables.

CodePudding user response：

Since the probability of sampling a duplicate is small, vectorized repeated sampling will be very performant.

rreplace <- function(x, y, i) {
  v <- x
  
  while(length(i)) {
    x[i] <- sample(y, length(i), 1)
    i <- i[v[i] == x[i]]
  }
  
  x
}

word <- c("A","E","C","A","R","O","P")
positions <- c(1,5,3,7)
letters <- c("A","B","C","D","E","F")

rreplace(word, letters, positions)
#> [1] "C" "E" "D" "A" "A" "O" "F"

A larger example for benchmarking:

word <- sample(LETTERS, 1e5, 1)
letters <- LETTERS[1:15]
positions <- sample(length(word), 1e4)
# check that the correct words were replaced
word2 <- rreplace(word, letters, positions)
all(word[positions] != word2[positions])
#> [1] TRUE
all(word[-positions] == word2[-positions])
#> [1] TRUE

microbenchmark::microbenchmark(rreplace = rreplace(word, letters, positions),
                               RobertHacken = sapply(word[positions], function(w) sample(setdiff(letters, w), 1)))
#> Unit: milliseconds
#>          expr      min        lq       mean    median       uq      max neval
#>      rreplace   1.1181   1.33035   1.665195   1.61985   1.8919   3.9958   100
#>  RobertHacken 104.9374 145.25685 151.923915 156.94295 165.9491 198.5219   100