Home > front end >  How to use regex to look around a complex pattern?
How to use regex to look around a complex pattern?

Time:10-25

I have the following html element in Sublime Text:

<div class="exg"><div><strong class="syn">investigate</strong><span class="syn">, conduct investigations into, make inquiries into, inquire into, probe, examine, explore, research, study, look into, go into</span></div>

I want to use regex to select the content after and including the 5th comma in this element, stopping before </span></div>. So, in this case I'd want to select:

, examine, explore, research, study, look into, go into

So far, I was able to write this regex, which works:

(<div class="exg"><div><strong class="syn">(\w )((\s)?(\w )?) </strong><span class="syn">((\,((\s)?(\w )?) )?){5})

This allows me to select the part before what I need to select. I tried to use this with a positive lookbehind, but it isn't working and I can't figure out how to fix it. Here is what I tried:

(?<=(<div class="exg"><div><strong class="syn">(\w )((\s)?(\w )?) </strong><span class="syn">((\,((\s)?(\w )?) )?){3}))((\,?((\s)?(\w )?) ?) )

CodePudding user response:

You make a heavy use of parenthesis. Also your expression for catching words between commas could be simpler. Replacing your groups with non capturing ones, you'll get the expected match in your first (and only) group with this regex:

(?<=<div class="exg"><div><strong class="syn">)(?:\s?\w)*<\/strong><span class="syn">(?:,(?:\s?\w)*){4}(.*?)(?=<\/span><\/div>)

BTW if you want to capture the 5th comma I think your quantifier should be {4} (but I might have misunderstood)

Check the Demo

Update: If you're looking to delete the matched group (i.e. replacing it with an empty string). Just do the opposite: build one group before and one after:

(<div class="exg"><div><strong class="syn">(?:\s?\w)*<\/strong><span class="syn">(?:,(?:\s?\w)*){4}).*?(<\/span><\/div>)

Demo
Then replace in your editor with \1\2(groups one after the other, without the previously matched string inbetween)

  • Related