I have this string:
seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
given another string
bb_seq <- "rhhhhitv"
What I'd like to do is to replace ?
with a character in bb_seq
by keeping the order of bb_seq
resulting in :
The total length of ?
is guaranteed to be the same with bb_seq
.
KrEDhhHRDDKDKDhHEhREKEitDEvKKK
How can I achieve that with R?
I tried this but failed:
seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
bb_seq <- "rhhhhitv"
sp <- seed_pattern
gr <- gregexpr("\\? ", sp)
csml <- lapply(gr, function(sp) cumsum(attr(sp, "match.length")))
regmatches(sp, gr) <- lapply(csml, function(sp) substring(bb_seq, c(1, sp[1]), sp))
sp
# KrEDrhhHRDDKDKDrhhhHErhhhhREKErhhhhitDErhhhhitvKKK
I'm open to non-regex solutions.
CodePudding user response:
Split, replace, combine:
> target <- strsplit(seed_pattern, "")[[1]]
> replacement <- strsplit(bb_seq, "")[[1]]
> target[target=="?"] <- replacement
> paste(target, collapse = "")
[1] "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"
CodePudding user response:
You can do this (perhaps not very efficiently) by replacing one ?
at a time:
seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
bb_seq <- "rhhhhitv"
for (ch in unlist(strsplit(bb_seq, ""))) {
print(ch)
seed_pattern <- sub("?", ch, seed_pattern, fixed = TRUE)
}
print(seed_pattern)
# [1] "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"
Sadly sub
is not vectorized over the replacement
argument!
CodePudding user response:
Here is a long way. I can't still do these things without thinking in tibbles or data frames . Hoping that someday I will grasp this:
library(dplyr)
library(tidyr)
tibble(seed_pattern, bb_seq) %>%
separate_rows(seed_pattern, sep='\\?') %>%
mutate(seed_pattern = paste(paste0(seed_pattern, substr(bb_seq, row_number(), row_number())), collapse = "")) %>%
slice(1) %>%
pull(seed_pattern)
[1] "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"
CodePudding user response:
You can do this in a one-liner with a slight change to the solution you received from your earlier question (thanks @thelatemail):
regmatches(seed_pattern, gregexpr("\\?", seed_pattern)) <- strsplit(bb_seq, "")
Check it provides the expected result:
seed_pattern == "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"
[1] TRUE