I have the following set of string:
core_string <- "AFFVQTCRE"
mask_string <- "*KKKKKKKK"
What I want to do is to mask core_string
with mask_string
.
Whenever the *
coincide with character in core_string
, we will keep that character,
otherwise replace it.
So the desired result is:
AKKKKKKKK
Other example
core_string <- "AFFVQTCRE"
mask_string <- "*KKKK*KKK"
# result AKKKKTKKK
The length of both strings is always the same. How can I do that with R?
CodePudding user response:
regmatches
in replacement form <-
can be handy here:
regmatches(core_string, gregexpr("K", mask_string)) <- "K"
core_string
#[1] "AKKKKKKKK"
If it's a 1:1 match of characters rather than a constant, then it has to be changed up a little:
ss <- strsplit(mask_string, "")[[1]]
regmatches(core_string, gregexpr("[^*]", mask_string)) <- ss[ss != "*"]
CodePudding user response:
Here's a helper function that will do just that
apply_mask <- function(x, mask) {
unlist(Map(function(z, m) {
m[m=="*"] <- z[m=="*"]
paste(m, collapse="")
}, strsplit(x, ""), strsplit(mask, "")))
}
basically you just split up the string into characters and replace the characters that have a "*" then paste the strings back together.
I used the Map
to make sure the function is still vectorized over the inputs. For example
core_string <- c("AFFVQTCRE", "ABCDEFGHI")
mask_string <- "*KKKK*KKK"
apply_mask(core_string, mask_string)
# [1] "AKKKKTKKK" "AKKKKFKKK"