I am trying to replace values in a list word, on indexes specified by the list positions, by sampling values that exist in a third list called letters.
Here's an example of how my lists look like:
word <- c("A","E","C","A","R","O","P")
positions <- c(1,5,3,7)
letters <- c("A","B","C","D","E","F")
One important detail is that the value in word[position] should not remain the same after sampling, which can happen because of overlapping values in letters and word
The current code that I am using to do this is:
for (i in 1:length(positions)){
temp <- word[[positions[i]]]
word[[positions[i]]] <- sample(letters, 1)
while (word[[positions[i]]] == temp) {
word[[positions[i]]] <- sample(letters, 1)
}
}
While this works, I realize that it's extremely inefficient, as the order in which I change the values in the list doesn't matter. I've been trying to use of of the "apply" family of functions to solve this, but I am having trouble figuring out a solution.
Thank you very much for the attention!
CodePudding user response:
You can do this:
word[positions] <- sapply(word[positions],
\(w) sample(setdiff(letters, w), 1))
Inside sapply
you always remove the current word from letters
, therefore a different one is guaranteed to be sample
d.
Also note that letters
is a built-in R constant (containing lowercase english alphabet, see ?letters
) so it is generally not a good idea to use this name for user-defined variables.
CodePudding user response:
Since the probability of sampling a duplicate is small, vectorized repeated sampling will be very performant.
rreplace <- function(x, y, i) {
v <- x
while(length(i)) {
x[i] <- sample(y, length(i), 1)
i <- i[v[i] == x[i]]
}
x
}
word <- c("A","E","C","A","R","O","P")
positions <- c(1,5,3,7)
letters <- c("A","B","C","D","E","F")
rreplace(word, letters, positions)
#> [1] "C" "E" "D" "A" "A" "O" "F"
A larger example for benchmarking:
word <- sample(LETTERS, 1e5, 1)
letters <- LETTERS[1:15]
positions <- sample(length(word), 1e4)
# check that the correct words were replaced
word2 <- rreplace(word, letters, positions)
all(word[positions] != word2[positions])
#> [1] TRUE
all(word[-positions] == word2[-positions])
#> [1] TRUE
microbenchmark::microbenchmark(rreplace = rreplace(word, letters, positions),
RobertHacken = sapply(word[positions], function(w) sample(setdiff(letters, w), 1)))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> rreplace 1.1181 1.33035 1.665195 1.61985 1.8919 3.9958 100
#> RobertHacken 104.9374 145.25685 151.923915 156.94295 165.9491 198.5219 100