Home > database >  How to mask a string based on a pattern of string of same length
How to mask a string based on a pattern of string of same length

Time:03-04

I have the following set of string:

core_string     <- "AFFVQTCRE"
mask_string     <- "*KKKKKKKK"

What I want to do is to mask core_string with mask_string. Whenever the * coincide with character in core_string, we will keep that character, otherwise replace it.

So the desired result is:

   AKKKKKKKK

Other example

core_string     <- "AFFVQTCRE"
mask_string     <- "*KKKK*KKK"
 #     result       AKKKKTKKK

The length of both strings is always the same. How can I do that with R?

CodePudding user response:

regmatches in replacement form <- can be handy here:

regmatches(core_string, gregexpr("K", mask_string)) <- "K"
core_string
#[1] "AKKKKKKKK"

If it's a 1:1 match of characters rather than a constant, then it has to be changed up a little:

ss <- strsplit(mask_string, "")[[1]]
regmatches(core_string, gregexpr("[^*]", mask_string)) <- ss[ss != "*"]

CodePudding user response:

Here's a helper function that will do just that

apply_mask <- function(x, mask) {
  unlist(Map(function(z, m) {
    m[m=="*"]  <- z[m=="*"]
    paste(m, collapse="")
  }, strsplit(x, ""), strsplit(mask, "")))
}

basically you just split up the string into characters and replace the characters that have a "*" then paste the strings back together.

I used the Map to make sure the function is still vectorized over the inputs. For example

core_string     <- c("AFFVQTCRE", "ABCDEFGHI")
mask_string     <- "*KKKK*KKK"

apply_mask(core_string, mask_string)
# [1] "AKKKKTKKK" "AKKKKFKKK"
  • Related