Home > Enterprise >  R Combining strings between known strings
R Combining strings between known strings

Time:11-10

I have a long vector of strings that have a certain structure. I would like to combine strings and reveal this structure. An example will clear this one.

chr_vec <- c("Random Title", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Another Random Title", "Start", "erg", "vdf", "vfd", "efw", "Stop",
             "Start", "erg", "vdf", "vfd", "efw", "Stop", "Start", "erg", "vdf", "vfd", "efw", "Stop")

So I have random title but then words between Start - Stop (those included should be combined together. Random titles should be included, so I know which block structure belongs. Result would be something like this:

result <- list("Random Title" = list(c("Start", "dsf", "sdvf", "Stop"), c("Start", "dsf", "sdvf", "Stop")),
                 "Another Random Title" = list(c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop")))
> result
$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

I'm not sure how many strings there are between START- STOP. Titles are random. My data format doesn't need to be vector. I tried this via tibble and cumsum, but that fails because there are those titles that I need.

My effort:

res <- tibble(text = chr_vec) %>% 
  mutate(group = cumsum(text == "Start"))

This almost works, but those titles are messing this approach. They will be wrongly identified.

CodePudding user response:

A solution in base R

t1=grep("Start",chr_vec)
t2=grep("Stop",chr_vec)
sek=mapply(seq,t1,t2)

j=1
lst=list()
for (i in 1:length(sek)) {
  
  if (i==1) {
    tit=chr_vec[1]
  } else {
    if ((head(sek[[i]],1)-tail(sek[[i-1]],1))!=1) {
      tit=chr_vec[head(sek[[i]],1)-1]
      j=1
    }
  }
  
  lst[[tit]][[j]]=chr_vec[sek[[i]]]
  j=j 1
}

resulting in

$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[3]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop"
  • Related