Regex extract thrid number-CodePudding

I have some strings and I am trying to extract the third number that appears in it (in R). This is an example string, all of them have the same pattern:

string = "Speaks 2 times (1%) for a total of 34 words (1%)."

I have been able to get the first number with str_extract(string, "[0-9] ") But I have no idea on how to take only the third (the number of words). Any help would be gratly appreciated!

CodePudding user response：

A better approach would be to target the number before the text words:

string <- "Speaks 2 times (1%) for a total of 34 words (1%)."
num <- sub("^.*\\b(\\d ) words\\b.*", "\\1", string)
num

[1] "34"

CodePudding user response：

You can extract all matches and grab the 3rd item:

library(stringr)
string = "Speaks 2 times (1%) for a total of 34 words (1%)."
unlist(str_extract_all(string, "[0-9] "))[3]
## => [1] "34"

Also, you can use sub:

sub("^(?:\\D \\d ){2}\\D (\\d ).*", "\\1", string)
## => [1] "34"

See the R demo. Details:

^ - start of string
(?:\D \d ){2} - two occurrences of any one or more non-digit chars and then one or more digits (note: replace {2} with {3} to extract the fourth number, or remove it to get the second number, adjust as you see fit)
\D - zero or more non-digit chars
(\d ) - Group 1 (\1): one or more digits
.* - any zero or more chars as many as possible.