Replace BBCodes in HTML codes and vice versa-CodePudding

I have a sentence with BBCodes and I would like to replace it with HTML codes:

$sentence = '[html style="font-size: 18px;" dir="ltr"][div style="font-size: 18px;" dir="ltr"][p style="font-weight: bold;"]Hello,[/p][p]You have got a new message from [a href="https://www.example.com/"]Example.com[/a][br][br].You could check your message on [a href="https://www.example.com/en/manager/inbox.html"]Manager[/a][/p][p][img src="https://www.example.com/assets/images/logo-default-120x50.png" width="120px" height="80px"][div style="color: #D4192D; font-weight: bold;"]Example.com Team[/div][/p][/div][/html]';


$htmlTags = '<$1>$2</$3>';
$bbTags = '/\[(.*)\](.*)\[\/(.*)\]/'; 


$new = preg_replace($bbTags, $htmlTags, $sentence);
echo $new;

The output is:

<html style="font-size: 18px;" dir="ltr"][div style="font-size: 18px;" dir="ltr"][p style="font-weight: bold;"]Hello,[/p][p]You have got a new message from [a href="https://www.example.com/"]Example.com[/a][br][br].You could check your message on [a href="https://www.example.com/en/manager/inbox.html"]Manager[/a][/p][p][img src="https://www.example.com/assets/images/logo-default-120x50.png" width="120px" height="80px"][div style="color: #D4192D; font-weight: bold;"]Example.com Team[/div][/p][/div></html>

So it does not cover the whole sentence.

I do not want to place an array of codes with their replacements

PS: The sentence could be changed, from case to case basis

CodePudding user response：

You can use the following PHP code:

<?php

$sentence = '[html style="font-size: 18px;" dir="ltr"][div style="font-size: 18px;" dir="ltr"][p style="font-weight: bold;"]Hello,[/p][p]You have got a new message from [a href="https://www.example.com/"]Example.com[/a][br][br].You could check your message on [a href="https://www.example.com/en/manager/inbox.html"]Manager[/a][/p][p][img src="https://www.example.com/assets/images/logo-default-120x50.png" width="120px" height="80px"][div style="color: #D4192D; font-weight: bold;"]Example.com Team[/div][/p][/div][/html]';

$rx = '~\[((\w )\b[^]]*)\]((?>(?!\[\2\b).|(?R))*)\[\/\2]~s';
$tmp = '';
while (preg_match($rx, $sentence) && $tmp != $sentence) {
    $tmp = $sentence;
    $sentence = preg_replace($rx, '<$1>$3</$2>', $sentence);
}
$sentence = preg_replace('~\[([^]]*)]~', '<$1 />', $sentence);
echo $sentence;

Output:

<html style="font-size: 18px;" dir="ltr">
<div style="font-size: 18px;" dir="ltr">
  <p style="font-weight: bold;">Hello,</p>
  <p>You have got a new message from <a href="https://www.example.com/">Example.com</a><br /><br />.You could check your message on <a href="https://www.example.com/en/manager/inbox.html">Manager</a></p>
  <p><img src="https://www.example.com/assets/images/logo-default-120x50.png" width="120px" height="80px" />
    <div style="color: #D4192D; font-weight: bold;">Example.com Team</div>
  </p>
</div>
</html>

See the regex demo #1 and regex demo #2.

Details:

\[ - a [ char
((\w )\b[^]]*) - Group 1 ($1): one or more word chars (captured into Group 2), then a word boundary and zero or more chars other than ] char
] - a ] char
((?>(?!\[\2\b).|(?R))*) - Group 3 ($3): any char that is not a starting point of a [ Group 2 (as a whole word) char sequence, or the whole pattern recursed
\[\/\2] - [/ string, Group 2 value, ] char.

This is the pattern that handled paired tags. The second pattern handles non-paired tags:

\[ - a [ char
([^]]*) - Group 1 ($1): any zero or more chars other than ]
] - a ] char.

CodePudding user response：

Obviously, it's not possible to do it in one pass because you have to deal with nested tags and a pattern can't match several times the same substrings.

A solution consists to start the replacement with the innermost tags (tags without other bracketed tags inside). To do that you don't need a recursive pattern but only to forbid opening brackets when you describe the text contents.

$sentence = '[html style="font-size: 18px;" dir="ltr"][div style="font-size: 18px;" dir="ltr"][p style="font-weight: bold;"]Hello,[/p][p]You have got a new message from [a href="https://www.example.com/"]Example.com[/a][br][br].You could check your message on [a href="https://www.example.com/en/manager/inbox.html"]Manager[/a][/p][p][img src="https://www.example.com/assets/images/logo-default-120x50.png" width="120px" height="80px"][div style="color: #D4192D; font-weight: bold;"]Example.com Team[/div][/p][/div][/html]';

// proceed to the replacement of all self-closing tags first
$result = preg_replace('~\[ (br|hr|img)\b ([^]]*) ]~xi', '<$1$2/>', $sentence);


// then replace the innermost tags until there's nothing to replace
$count = 0;
do {
    $result = preg_replace('~
        \[ ( (\w ) [^]]* ) ]     # opening tag
        ( [^[]*  )               # content without other bracketed tags
        \[/ \2 ]                 # closing tag
    ~xi', '<$1>$3</$2>', $result, -1, $count);
} while ($count);

echo $result;

demo

The 5th parameter of preg_replace is a variable reference in which the number of replacements is stored ($count here). This variable is used as a condition to stop the do...while loop. (When $count==0 there's no more things to replace).