I struggle with my regex pattern and I need some help. I'm trying to scrape some javascript content to capture label and value pairs.
I started with this
"label" : "(.*)"[\s\S]*?"value" : "(.*)"
to scrape something like this
"label" : "Something ?",
blabla
blabla2
"value" : "NO"
blabla
"label" : "Foo ?",
blabla
blabla2
"value" : "Bar"
It works, it captures the label in the first group and the value in the second.
Something ? / NO
Foo ? / Bar
My problem is that, sometimes I have a label but NOT a value. In this case I would like to NOT match this isolated label.
Eg :
"label" : "Something ?",
blabla
blabla2
"value" : "NO"
blabla
"label" : "YouMad ?",
blabla
"label" : "Foo ?",
blabla
blabla2
"value" : "Bar"
gives
Something ? / NO
YouMad ? / Bar
I tried Negative Lookahead, some other thing... but i'm stuck
Any help will be appreciated!
Thanks
https://regex101.com/r/gG3XxR/1
CodePudding user response:
You should not cross matching label using (?!"label")[\s\S])*
and use a negated character class [^
to prevent some backtracking.
"label" : "([^"]*)"(?:(?!"label")[\s\S])*"value" : "([^"]*)"