Context
I got a vector a
.
a = c('nameJack\n', 'name Lucy\n', 'name Rose\n', 'name Biden\n', 'name Peter\n')
Question
I want to extract the real name in a
. such as:
[1] "Jack" "Lucy" "Rose" "Biden" "Peter"
But the characters I extract always contain spaces.
What I've done
I tried:
> str_extract(a, "(?<=name\\s).*(?=\n)")
[1] NA "Lucy" " Rose" " Biden" " Peter"
Then I tried:
> str_extract(a, "(?<=name\\s*).*(?=\n)")
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT, context=`(?<=name\s*).*(?=
)`)
I also tried:
> str_extract(a, "(?<=name\\s{0,6}).*(?=\n)")
[1] "Jack" " Lucy" " Rose" " Biden" " Peter"
CodePudding user response:
Rather than trying to match the name with ".*"
, which will pick up the space characters, you could use "\\w "
instead to match one or more word characters:
library(stringr)
a <- c('nameJack\n', 'name Lucy\n', 'name Rose\n', 'name Biden\n', 'name Peter\n')
str_extract(a, "(?<=name\\s{0,6})\\w (?=\n)")
#> [1] "Jack" "Lucy" "Rose" "Biden" "Peter"
Or another approach would be to use str_replace()
with a capturing group, which is nice in that it frees you from needing look-behind/ahead assertions, leading to a somewhat more readable regex pattern:
str_replace(a, "name\\s*(\\w )\n", "\\1")
#> [1] "Jack" "Lucy" "Rose" "Biden" "Peter"