Home > database >  regex extract the number that is on the line of "specificWord"
regex extract the number that is on the line of "specificWord"

Time:11-09

I would like to know how to extract the number that is on the line of "specificWord"

s <-"
01-06-2021
                line :                               0.15           
                Rate :                      0,30 %
                specificWord:                     0,14

01-06-2021
                line :                               2       
                Rate :                      0,30 %
                specificWord:                     0,20
                
01-06-2021
                line :                               1.15       
                Rate :                      1,05 %
                specificWord:                     1

"

p <-"(?<=specificWord:\\s)\\d ,\\d*"
str_match_all(s, p)

CodePudding user response:

I get good results with this expression:

(?<=specificWord:\s )\d(,\d )?

The main difference to your expression is the quantifier of the whitespace character in the positive lookbehind.

For your purposes, you need to escape the backslashes of course.

Find an interactive example of the expression here if you need to tune it before heading back to your code: regexr.com/6956g

CodePudding user response:

You can try:

trimws(c(stringr::str_match_all(s, "(?<=specificWord:)\\s*\\d ,?\\d*")[[1]]))
#[1] "0,14" "0,20" "1"

or

x <- grep("specificWord:", strsplit(s, "\\n")[[1]], value = TRUE)
regmatches(x, regexpr("\\d (,\\d )?", x))
#[1] "0,14" "0,20" "1"

CodePudding user response:

You can use str_extract_all to extract the target numbers defined by their co-occurrence to the right of the positive lookbehind (?<=specificWord:\\s{1,100}):

library(stringr)
unlist(str_extract_all(s, "(?<=specificWord:\\s{1,100})[\\d,] "))
[1] "0,14" "0,20" "1"

or:

str_extract_all(s, "(?<=specificWord:\\s{1,100})[\\d,] ")[[1]]
  • Related