Home > Software engineering >  How to find/replace with regex in Sublime Text between two H3 tags?
How to find/replace with regex in Sublime Text between two H3 tags?

Time:12-17

I have 250 blocks of HTML list items, and I need to remove specific lines between <h3></h3> tags.

The lines (including h3, li, a) that need to be removed will contain "USPS".

<ul>
   <h3>
      <li><a href="medicine/Alabama/Birmingham">Medicine in Birmingham, AL</a>
      </li>
   </h3>
   <h3>
      <li><a href="/shampoo/Alabama/Birmingham">Shampoo in Birmingham, AL</a>
      </li>
   </h3>
   <h3>
      <li><a href="/usps/Alabama/Birmingham">USPS in Birmingham, AL</a></li>
   </h3>
   <h3>
      <li><a href="/snacks/Alabama/Birmingham">Snacks in Birmingham, AL</a></li>
   </h3>
</ul>
<ul>
   <h3>
      <li><a href="/medicine/Arizona/Mesa">Medicine in Mesa, AZ</a></li>
   </h3>
   <h3>
      <li><a href="/shampoo/Arizona/Mesa">Shampoo in Mesa, AZ</a></li>
   </h3>
   <h3>
      <li><a href="/usps/Arizona/Mesa">USPS in Mesa, AZ</a></li>
   </h3>
   <h3>
      <li><a href="/snacks/Arizona/Mesa">Snacks in Mesa, AZ</a></li>
   </h3>
</ul>

I have tried using regex, but it's removing too much. I have a saved link here for the latest regex attempt: https://regex101.com/r/l4Ud4v/1

(?s)<h3>.*USPS.*?<\/h3>

Desired results:

<ul>
   <h3>
      <li><a href="medicine/Alabama/Birmingham">Medicine in Birmingham, AL</a>
      </li>
   </h3>
   <h3>
      <li><a href="/shampoo/Alabama/Birmingham">Shampoo in Birmingham, AL</a>
      </li>
   </h3>
   <h3>
      <li><a href="/snacks/Alabama/Birmingham">Snacks in Birmingham, AL</a></li>
   </h3>
</ul>
<ul>
   <h3>
      <li><a href="/medicine/Arizona/Mesa">Medicine in Mesa, AZ</a></li>
   </h3>
   <h3>
      <li><a href="/shampoo/Arizona/Mesa">Shampoo in Mesa, AZ</a></li>
   </h3>
   <h3>
      <li><a href="/snacks/Arizona/Mesa">Snacks in Mesa, AZ</a></li>
   </h3>
</ul>

There are 250 of these "USPS" instances that need to removed while preserving the rest of the HTML.

CodePudding user response:

Try

(?s)<h3>(?:(?!</h3>).)*USPS.*?</h3>

https://regex101.com/r/AB6wxS/1

Even non-greedy (?s)<h3>.*?USPS.*?</h3> will fail because it'll match at the first <h3> and then consume until it finds USPS, matching over the closing tags. To avoid that you can do (?:(?!</h3>).)* which basically says match any char as long as it's not the start of </h3>.

CodePudding user response:

If you have that specific formatting for all the lines (with h3, li, a), and you want to match them in Sublime:

<h3>\s*<li>\s*<a\b[^<>]*>[^<>]*\bUSPS\b[^<>]*</a>\s*</li>\s*</h3>

The \s* matches optional whitespace characters, and [^<>]* is a negated character class that matches any character including newlines, except for < and >

See a regex demo.

  • Related