Home > front end >  Get string in between many other strings [R]
Get string in between many other strings [R]

Time:10-07

Here I want to extract the string part "wanted1part". I could do it like this:

string <- "foo_bar_doo_xwanted1part_more_junk"
gsub("\\_.*", "", gsub(".*?_x", "", string))
#> [1] "wanted1part"

But I wanted hoping that maybe someone could suggest a one line solution?

CodePudding user response:

If you want to stick with using gsub, you can use a capture group that is backreferenced in the replacement:

gsub('^. _x(\\w ?)_. $', '\\1', string, perl = TRUE)

The key here is to have the pattern match the whole string but to have a capture group, specified using parenthesis, match the part of the string you would like to keep. This group, here "(\\w ?)", can then replace the entire string when we reference it in the replacement.

I've found that using str_extract from stringr can make this kind of thing a easier as it allows me to avoid the use of capture groups.

library(stringr)
str_extract(string, '(?<=_x)\\w ?(?=_)')

Here, I use a lookahead and lookbehind instead to identify the part of the string we want to extract.

  • Related