I have the following strings:
x <- "??????????DRHRTRHLAK??????????"
x2 <- "????????????????????TRCYHIDPHH"
x3 <- "FKDHKHIDVK????????????????????TRCYHIDPHH"
x4 <- "FKDHKHIDVK????????????????????"
What I want to do is to replace all the ?
characters with
another string
rep <- "ndqeegillkkkkfpssyvv"
Resulting in:
ndqeegillkDRHRTRHLAKkkkfpssyvv # x
ndqeegillkkkkfpssyvvTRCYHIDPHH # x2
FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH # x3
FKDHKHIDVKndqeegillkkkkfpssyvv # x4
Basically, keeping the order of rep
in the replacement with the interleaving characters DRHRTRHLAK
in x
.
The total length of rep
is the same as the total length of ?
, 20 characters.
Note that I don't want to split rep
manually again as an extra step.
I tried this but failed:
>gsub(pattern = "\\? ", replacement = rep, x = x)
[1] "ndqeegillkkkkfpssyvvDRHRTRHLAKndqeegillkkkkfpssyvv"
CodePudding user response:
You can count the number of ?'s and then cut rep
based on that:
x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
pattern <- "(\\? )(DRHRTRHLAK)(\\? )"
n <- nchar(gsub(pattern, "\\1", x))
gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n 1, nchar(rep))), x)
#[1] "ndqeegillk??????????kkkfpssyvv"
Edit: new examples:
A very verbose way is to do a if else chain, checking where the ?'s are, and substituting rep
accordingly.
if(grepl("^\\?. \\?$", x)){ #?'s on both ends
n <- gsub(pattern, "\\1", x) %>% nchar()
gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n 1, nchar(rep))), x)
} else if(grepl("^\\?", x)){ #?'s only on start
n <- gsub(pattern, "\\1", x) %>% nchar()
gsub(pattern, paste0(substr(rep, 1, n), "\\2"), x)
} else if(grepl("\\?$", x)){ #?'s only on end
n <- gsub(pattern, "\\2", x) %>% nchar()
gsub(pattern, paste0("\\2", substr(rep, 1, n)), x)
} else if(grepl("^[A-Z] \\? [A-Z] $", x)){ #?'s only on middle
n <- gsub(pattern, "\\2", x) %>% nchar()
gsub("([A-Z] )\\? ([A-Z] )", paste0("\\1", substr(rep, 1, n), "\\2"), x)
}
CodePudding user response:
String Split with substr()
:
x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
x<-gsub(pattern = "^\\? ", replacement = substr(rep, 1, 10), x = x)
x<-gsub(pattern = "\\? $", replacement = substr(rep, 11, 20), x = x)
x
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"
Regex ^
matches start, and $
matches end.
CodePudding user response:
Example data:
x <- c(
"??????????DRHRTRHLAK??????????",
"????????????????????TRCYHIDPHH",
"FKDHKHIDVK????????????????????TRCYHIDPHH"
)
rep <- "ndqeegillkkkkfpssyvv"
Fix it up with regmatches<-
replacements in a vectorised fashion:
gr <- gregexpr("\\? ", x)
csml <- lapply(gr, \(x) cumsum(attr(x, "match.length")) )
regmatches(x, gr) <- lapply(csml, \(x) substring(rep, c(1,x[1]), x) )
##[1] "ndqeegillkDRHRTRHLAKkkkkfpssyvv"
##[2] "ndqeegillkkkkfpssyvvTRCYHIDPHH"
##[3] "FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH"