Home > OS >  Regex to extract pattern from string until a symbol occurs, or the string ends
Regex to extract pattern from string until a symbol occurs, or the string ends

Time:08-24

I have an issue where I want to extract a pattern from a vector of strings, ie extract c("TAG a", "TAG b", "TAG c") from c("TAG a", "TAG b-3", "TAG c 3")

So far I've tried:

str_vec <- c("TAG a", "TAG b-3", "TAG c 2", "2 TAG d")

stringr::str_extract(str_vec, "TAG .*(?=[\\ \\-])")

Which returns TAG b and c correctly, but doesn't extract TAG a or d.

If I try

stringr::str_extract(str_vec, "TAG .*(?=[\\ \\-]|$)")

TAG a and d are returned correclty, but $ seems to override /- so TAG b and c are returned with their suffixes still attached.

CodePudding user response:

You need

str_vec <- c("TAG a", "TAG b-3", "TAG c 2", "2 TAG d")
stringr::str_extract(str_vec, "TAG [^ -]*")
# => [1] "TAG a" "TAG b" "TAG c" "TAG d"

Details:

  • TAG - a fixed string
  • [^ -]* - zero or more chars other than - and .

See the regex demo and the R demo.

CodePudding user response:

How about:

library(stringr)

str_extract(str_vec, "TAG [a-z]")

Output:

[1] "TAG a" "TAG b" "TAG c" "TAG d"
  • Related