Trying to match string that starts with #1-9 note: # is followed by a number from 1 to 9 and ends with #1-9 (or not).

Full string : "#1Lorem Ipsum is simply dummy text#2printing and typesetting industry"

Idea:

is to replace #1Lorem Ipsum is simply dummy text with Lorem Ipsum is simply dummy text

and #2printing and typesetting industry with printing and typesetting industry

so to replace #1-9 with  and append the ending tag  at the end of each.

but:

let's say if the string has only one string starting with #1-9 like that :

"#1Lorem Ipsum is simply dummy text" how could be putting  at the end to close the  tag.

i'm guessing maybe using the last " at the end of words to prepend the closing  tag before it, since no more #1-9 to stop before it, but without losing or replacing the last " of the string.

so it becomes: "Lorem Ipsum is simply dummy text"

Regex i've tried : (#[0-9])(.*?)(#|") but this is only matching the first part #1 of the string and ignoring the #2 part (see full string).

I will be using php to match and replace maybe using preg_replace just need to find a way to the regex part first.

How can i achieve this?

CodePudding user response：

What you are looking for is a

Which was generated using regexper.com

CodePudding user response：

<?php
function convert($str) {
    static $numberNamesMap = [
        1 => 'one',
        2 => 'two',
        3 => 'three',
        4 => 'four',
        5 => 'five',
        6 => 'six',
        7 => 'seven',
        8 => 'eight',
        9 => 'nine',
    ];
    return preg_replace_callback(
        '~#([1-9])(((?!#[1-9]).)*)~',
        function($matches) use ($numberNamesMap) {
            $class = $numberNamesMap[$matches[1]];
            $htmlText = htmlentities($matches[2]);
            return "<span class=\"$class\">$htmlText</span>";
        },
        $str
    ); 
}

References

Examples

echo convert('#1Lorem Ipsum is simply dummy text');

outputs:

<span >Lorem Ipsum is simply dummy text</span>

echo convert('#1Lorem Ipsum is simply dummy text#2printing and typesetting industry');

outputs:

<span >Lorem Ipsum is simply dummy text</span><span >printing and typesetting industry</span>

echo convert('#1Lorem Ipsum is simply dummy text#0printing and typesetting industry');

outputs:

<span >Lorem Ipsum is simply dummy text#0printing and typesetting industry</span>

CodePudding user response：

preg_replace_callback() is the right tool for this job. To avoid needing to manually declare a number mapping array, you can use the NumberFormatter class. Using sprintf() in the callback body will help to separate data from the html and make maintenance easier.

Code: (Demo)

$string = '#1Lorem Ipsum is simply dummy text#2printing and typesetting industry#0nothing#35That\'s a big one!';

echo preg_replace_callback(
         '/#(\d )((?:(?!#\d).) )/',
         fn($m) => sprintf(
             '<span >%s</span>',
             (new NumberFormatter("en", NumberFormatter::SPELLOUT))->format($m[1]),
             htmlentities($m[2])
         ),
         $string
     );

Output:

<span >Lorem Ipsum is simply dummy text</span><span >printing and typesetting industry</span><span >nothing</span><span >That&#039;s a big one!</span>

Note that if your actual strings after the #[number] NEVER have # symbols in it you can DRAMATICALLY improve the regex performance by using a greedy negated character class as the second capture group. #(\d )([^#] ) This reduces the step count from 283 steps to just 16 steps on your sample string.

To be perfectly honest, even a lazy pattern like #(\d )(. ?(?=#\d|$)) will process the sample string in 213 steps. Performance might not be a factor, so use whatever regex you are most comfortable reading.