I have some strings and I am trying to extract the third number that appears in it (in R). This is an example string, all of them have the same pattern:
string = "Speaks 2 times (1%) for a total of 34 words (1%)."
I have been able to get the first number with str_extract(string, "[0-9] ")
But I have no idea on how to take only the third (the number of words).
Any help would be gratly appreciated!
CodePudding user response:
A better approach would be to target the number before the text words
:
string <- "Speaks 2 times (1%) for a total of 34 words (1%)."
num <- sub("^.*\\b(\\d ) words\\b.*", "\\1", string)
num
[1] "34"
CodePudding user response:
You can extract all matches and grab the 3rd item:
library(stringr)
string = "Speaks 2 times (1%) for a total of 34 words (1%)."
unlist(str_extract_all(string, "[0-9] "))[3]
## => [1] "34"
Also, you can use sub
:
sub("^(?:\\D \\d ){2}\\D (\\d ).*", "\\1", string)
## => [1] "34"
See the R demo. Details:
^
- start of string(?:\D \d ){2}
- two occurrences of any one or more non-digit chars and then one or more digits (note: replace{2}
with{3}
to extract the fourth number, or remove it to get the second number, adjust as you see fit)\D
- zero or more non-digit chars(\d )
- Group 1 (\1
): one or more digits.*
- any zero or more chars as many as possible.