Home > Software design >  Replacing Heading <hn> without class Tags with <p>
Replacing Heading <hn> without class Tags with <p>

Time:03-16

I wish to replace hn tags that does not contain class attribute. The idea is to match anything that follow hn except a string contain a and ends with >

This is my first attempt:

<?php

$content = <<<HTML
<h1 style="color:black">test1</h1>
<H2 >test2</H2>
<h5 >test</h5>
<h5 >test test</h5>
HTML;

$content = preg_replace('#<h([1-6])((?!class).)*?>(.*?)<\/h[1-6]>#si', '<p  ${2}>${3}</p>', $content);

echo ($content);

The result is:

<p  ">test1</p>
<H2 >test2</H2>
<h5 >test</h5>
<h5 >test test</h5>

It should be:

<p  style="color:black">test1</p>
<H2 >test2</H2>
<h5 >test</h5>
<h5 >test test</h5>

Any idea why $2 map to " value instead of style="color:black"

CodePudding user response:

Your capturing group must be added in a bit different place.

Replace ((?!class).)*? with ((?:(?!class).)*?).

Use

'#<h([1-6])\s*((?:(?!class).)*?)>(.*?)</h[1-6]>#si'

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  <h                       '<h'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [1-6]                    any character of: '1' to '6'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
        class                    'class'
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  >                        '>'
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \3
--------------------------------------------------------------------------------
  </h                      '</h'
--------------------------------------------------------------------------------
  [1-6]                    any character of: '1' to '6'
--------------------------------------------------------------------------------
  >                        '>'
  • Related