Home > OS >  Regex capture substrings while omitting certain substring
Regex capture substrings while omitting certain substring

Time:05-30

I want to use regex to capture color, animal, and country from the following html. However, with country, there is a possibility that a <br> tag exists before the country name, such as with SPAIN in my example. I want to omit that <br> tag, so that only "SPAIN" is captured.

<p><span >RED</span><br><span >DOG</span>USA</p>
<p><span >GREEN</span><br><span >CAT</span><br>SPAIN</p>
<p><span >BLUE</span><br><span >MOUSE</span>FRANCE</p>

I have the following regex, but it doesn't omit the country <br> tag:

/<p><span >(.*)<\/span><br><span >(.*)<\/span>(.*)<\/p>/

Please help.

CodePudding user response:

Try this:

<p><span >(.*)<\/span><br><span >(.*)<\/span>(?:<br>)?(.*)<\/p>

(?:...) : non-capturing group.

? : 0 or 1 times

check pattern: Regex101

CodePudding user response:

You can try this to match only the content between > and <

(?<=>)([[:upper:]] )(?=<)

View Demo

  • Related