I have an assigment in which I have absolutely no idea how to start to make it work.
I have to create variations of list of words, where will be replaced every character (between 1st and last) with '*' on different positions.
It should look something like this:
input: c('smog', 'sting')
desired output: 's*og', 'sm*g', 's**g', 's*ing', 'st*ng', 'sti*g', 's***g'
Any idea how to achieve something like this?
Thank you very much
UPDATE I've found this solution:
s <- c( 'smog')
f <- function(x,y) {substr(x,y,y) <- "*"; x}
g <- function(x) Reduce(f,x,s)
unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))
output:
[1] "s*og" "sm*g" "s**g"
the only problem with this is, that it works only when there is one word in the string, not several
CodePudding user response:
See also this SO post for related techniques: Create all combinations of letter substitution in string
EDIT
From the OP edit and comment:
repfun2 <- function(s){
f <- function(x,y) {substr(x,y,y) <- "*"; x}
g <- function(x) Reduce(f,x,s)
out <- unlist(lapply(1:(nchar(s)-2),function(x) combn(2:(nchar(s)-1),x,g)))
return(out)
}
lapply(test2, FUN = repfun2)
Ouput:
> lapply(test2, FUN = repfun2)
[[1]]
[1] "s*og" "sm*g" "s**g"
[[2]]
[1] "s*ing" "st*ng" "sti*g" "s**ng" "s*i*g" "st**g" "s***g"
Previous answer for random replacement
I understand you want a random replacement of characters in a vector of strings. If this is correct, here is an idea:
test2 <- c('smog', 'sting')
repfun <- function(.string) {
n_char <- nchar(.string)
# random selection of n characters that will be replaced in the string
repchar <- sample(1:n_char, size = sample(1:n_char, size = 1))
# replacing the characters in the string
for(i in seq_along(repchar)) substring(.string, repchar[i], repchar[i]) <- "*"
return(.string)
}
lapply(test2, FUN = repfun)
Some outputs:
> lapply(test2, FUN = repfun)
[[1]]
[1] "*mog"
[[2]]
[1] "s*ing"
> lapply(test2, FUN = repfun)
[[1]]
[1] "s*o*"
[[2]]
[1] "s*i*g"
The basic idea is:
- Determine the number of characters in a string,
- Randomly sample it based on its length,
- Replace the randomly sampled characters by "*"
- Use
lapply
to pass a vector of character strings.
I think you can improve it by removing the for
loop if needed, see some ideas here and here