Home > front end >  Paste strings between patterns in R
Paste strings between patterns in R

Time:11-06

I would like to paste/collapse elements of a vector together based on the index a string first appears, up until it appears again. For example,

v <- c("foo", "bar1", "bar2", "foo", "bar1", "foo", "bar1", "bar2", "bar3", "foo")
v
[1] "foo"  "bar1" "bar2" "foo"  "bar1" "foo"  "bar1" "bar2" "bar3" "foo" 

I can do this using

c(paste0(v[1:3], collapse = ""), paste0(v[4:5], collapse = ""), paste0(v[6:9], collapse = ""), paste0(v[10:10], collapse = ""))
[1] "foobar1bar2"     "foobar1"         "foobar1bar2bar3" "foo"  

The pattern is always "foo" but the number of elements after the foo always changes (like this example it's 2, 1, 3, 0). There is a large number of lines so I'd prefer to avoid a loop. I think I could use

b <- which(v == "foo")
b
[1]  1  4  6 10
sapply(1:(length(b)-1), function(x) paste0(v[b[x]:(b[x 1]-1)], collapse = ""))
[1] "foobar1bar2"     "foobar1"         "foobar1bar2bar3"

I miss the last "foo". Any help would be greatly appreciated!

CodePudding user response:

how about this

sapply(split(v, cumsum(v=='foo')), paste0, collapse='')

CodePudding user response:

I think you just need to handle that last "foo":

v <- c("foo", "bar1", "bar2", "foo", "bar1", "foo", "bar1", "bar2", "bar3", "foo")

b <- which(v == "foo")

sapply(seq_along(b), 
       function(i) {
         if (i < length(b)) {
           paste(v[b[i]:(b[i 1] - 1)], collapse = "")
           } else {
             v[b[i]]
           }
         }
       )

This returns

#> [1] "foobar1bar2"     "foobar1"         "foobar1bar2bar3" "foo"      
  • Related