Home > database >  How to replace a character in a string with characters in a vector by preserving its order using R
How to replace a character in a string with characters in a vector by preserving its order using R

Time:11-17

I have this string:

 seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"

given another string

bb_seq <- "rhhhhitv"

What I'd like to do is to replace ? with a character in bb_seq by keeping the order of bb_seq resulting in :

The total length of ? is guaranteed to be the same with bb_seq.

KrEDhhHRDDKDKDhHEhREKEitDEvKKK

How can I achieve that with R?

I tried this but failed:

  seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
  bb_seq <- "rhhhhitv"
  sp <- seed_pattern
  gr   <- gregexpr("\\? ", sp)
  csml <- lapply(gr, function(sp) cumsum(attr(sp, "match.length")))
  regmatches(sp, gr) <- lapply(csml, function(sp) substring(bb_seq, c(1, sp[1]), sp))
  sp

  # KrEDrhhHRDDKDKDrhhhHErhhhhREKErhhhhitDErhhhhitvKKK

I'm open to non-regex solutions.

CodePudding user response:

Split, replace, combine:

> target <- strsplit(seed_pattern, "")[[1]]
> replacement <- strsplit(bb_seq, "")[[1]]
> target[target=="?"] <- replacement
> paste(target, collapse = "")
[1] "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"

CodePudding user response:

You can do this (perhaps not very efficiently) by replacing one ? at a time:

seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
bb_seq <- "rhhhhitv"

for (ch in unlist(strsplit(bb_seq, ""))) {
  print(ch)
  seed_pattern <- sub("?", ch, seed_pattern, fixed = TRUE)
}

print(seed_pattern)
# [1] "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"

Sadly sub is not vectorized over the replacement argument!

CodePudding user response:

Here is a long way. I can't still do these things without thinking in tibbles or data frames . Hoping that someday I will grasp this:

library(dplyr)
library(tidyr)

tibble(seed_pattern, bb_seq) %>% 
  separate_rows(seed_pattern, sep='\\?') %>% 
  mutate(seed_pattern = paste(paste0(seed_pattern, substr(bb_seq, row_number(), row_number())), collapse = "")) %>% 
  slice(1) %>% 
  pull(seed_pattern)
[1] "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"

CodePudding user response:

You can do this in a one-liner with a slight change to the solution you received from your earlier question (thanks @thelatemail):

regmatches(seed_pattern, gregexpr("\\?", seed_pattern)) <- strsplit(bb_seq, "")

Check it provides the expected result:

seed_pattern == "KrEDhhHRDDKDKDhHEhREKEitDEvKKK"
[1] TRUE
  • Related