Using R I have generated several strings of letters that range from 6-25 characters. I'd like for each one to generate an output that consists of all the combinations of these strings with every "I" substituted for a "L" and vice versa, the order of the characters should stay the same.
eg.
Input
"IVGLWEA"
OUTPUT
"IVGLWEA" "LVGLWEA" "LVGIWEA" 'IVGIWEA" "LVGLWEA"
many thanks
rob
CodePudding user response:
Edit: Thanks to @Skaqqs for the dynamic solution!
string <- "IVGLWEA"
# find the number of I's and L's in the string
n <- length(unlist(gregexpr("I|L", string)))
# make a grid of all possible combinations with this amount of I's and L's
df <- expand.grid(rep(list(c("I", "L")), n))
# replace I's and L's with %s
string_ <- gsub("I|L", "\\%s", string)
# replace %s with letters in grid
do.call(sprintf, as.list(c(string_, df)))
Result:
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA"
CodePudding user response:
Here's an extremely inefficient (but concise!) approach:
Create all potential combinations of your input characters and use regex to extract the desired pattern.
pattern <- "(I|L)VG(I|L)WEA"
b <- c("I", "V", "G", "L", "W", "E", "A")
strings <- apply(expand.grid(rep(list(b), 7)), 1, paste0, collapse = "")
grep(pattern, strings, value = TRUE)
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA"
CodePudding user response:
Here is a solution in R. I'm sure others can provide something more concise and versatile, but I think this code works for your specific scenario of finding every combination of replacement for two letters. The result is returned in a single vector, but you can remove unlist()
if you want each word separated.
I hope this helps!
# The function below works only for 2 target letters -
# it would have to be extended to work with more
target_letters <- c("I", "L")
words <- c("IVGLWEA", "SIINFEKL")
unlist(lapply(words, function(word) {
all_letters <- strsplit(word, '')[[1]]
target_letter_positions <- which(all_letters %in% target_letters)
all_letters[target_letter_positions] <- target_letters[1]
c(paste(all_letters, collapse = ''),
lapply(seq_len(length(target_letter_positions)), function(x) {
combn(
x = target_letter_positions,
m = x,
FUN = function(y) {
s <- all_letters
s[y] <- target_letters[2]
paste(s, collapse = '')
}
)
})
)
}))
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA" "SIINFEKI" "SLINFEKI" "SILNFEKI" "SIINFEKL" "SLLNFEKI" "SLINFEKL" "SILNFEKL" "SLLNFEKL"
You can strip back the outer lapply
to simplify if you only want to use it on one word at a time, and can turn it into a function to be applied to a table column.
All the best, IW