I want to take a string variable that has a lot of text in it, search until it finds a match "UpperBoundery" and then searches until it sees text after that upper boundary until it finds another match "LowerBoundery" then return to me the text that is between those two boundaries.
For example, the upper boundary would be ""Country":"" and the ending boundary would be "",".
This is a snip of what the text I'm dealing with looks like:
> }],"Country":"United States",
> }],"Country":"China",
So I want the results to come back:
> United States
> China
What code or function can people share with me to do this? I've been looking forever and tried numerious things (stri, grep, find, etc.) but I can't get anything to do what I'm looking for. Thank you for your help!
CodePudding user response:
Here's a regex method, though as I mentioned in comments I'd strongly recommend using, e.g., the jsonlite
package instead.
# input:
x = c('> }],"Country":"United States",',
'> }],"Country":"China",')
library(stringr)
result = str_extract(x, pattern = '(?<=Country":")[^,] (?=",)')
result
# [1] "United States" "China"
Explanation:
(?<=...)
is the look-behind pattern. So we're looking behind (before) the match forCountry":"
.[^"]
is our main pattern -^
in brackets is "not", so we're looking for any character that is not a"
. And"
characters.(?=...)
is the look-ahead pattern. So we're looking after the match for",
"