Home > Mobile >  Regular expression to extract words that starts with a pattern, but ends before symbols or spaces
Regular expression to extract words that starts with a pattern, but ends before symbols or spaces

Time:12-23

I have the following example

x <- "carr proc proc_ proca select procb() procth;"
pattern <- "proc"

The expected result would be

"proc" "proca" "procb" "procth"

could be a list or a vector.

I tried several regex with stringr::str_extract_all, but could not get all the words that I wanted.

CodePudding user response:

Use

pattern <- "\\bproc[[:alnum:]]*\\b"

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  proc                     'proc'
--------------------------------------------------------------------------------
  [[:alnum:]]*             any character of: letters and digits (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

CodePudding user response:

What about this?

> unique(agrep(pattern, unlist(strsplit(x, "[^[:alpha:]] ")), value = TRUE))
[1] "proc"   "proca"  "procb"  "procth"
  • Related