Home > Software engineering >  Replace consecutive pseudo tags with HTML tags with shared styles
Replace consecutive pseudo tags with HTML tags with shared styles

Time:11-18

I have this line

a[link], a[link] a [link] text text text a [link] text a[link] text

So I want to find the first links before the text and do one operation with them and highlight them in a special style (in this example, there may be three of them more or less) and find other links that go after the text and highlight them differently in styles.

I was able to find only the first three links, but I don't know how well I did it

<?php
$re = '/^(a\[(\w [\s ]?) \],?\s?) /iu';
$str = 'a[link], a[link] a[link] text text text a[link] text a[link] text';
preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE, 0);
var_dump($matches);
?>

I will try now to give an illustrative example of what is needed: There is such a text

a[link1], a[link2] a[link3] text text text a[link4] text a[link5] text

In this text there are links designated a[...]. In the future, I need to replace these links and bring it to this form:

<a href="link1" >link1</a><a href="link2" >link2</a><a href="link3" >link3</a> text text text <a href="link4" >link4</a> text <a href="link5" >link5</a> text

The first three links have a class assigned with the value style1. The links that come after the text already have a class value assigned to style2.

At the very beginning, there can be three links before the text, four or even one, as well as after the text there can be any number of links in any order.

CodePudding user response:

Don't try to match everything at once. Match each link individually then iterate over the results. Use preg_match_all for this, or preg_replace_callback if you want to do replace on each match. Using:

a\[(\w )\]

should achieve your goal.

It was unclear what the [\s ]? goal was, that optionally would allow whitespaces or s. Also unclear about the optional comma and space after the links. Keeping it simple is the best approach.

https://3v4l.org/2AvT1

CodePudding user response:

With php you can use the \G anchor and then use 2 capture groups to know the difference between the lines at the beginning and the other links.

\Ga\h*\[([^][]*)],?\h*|\[([^][]*)]

Explanation

  • \G Assert the current position at the start of the string, or at the end of the previous match
  • a\h* Match a and optional horizontal whitespace chars
  • \[([^][]*)] Match [...] and capture in group 1 what is in between the square brackets
  • ,?\h* match an optional comma and horizontal whitespace chars
  • | Or
  • \[([^][]*)] Match [...] and capture in group 2 what is in between the square brackets

See a regex demo.

CodePudding user response:

Use preg_replace_callback() to capture and replace the desired strings in one consolidated process.

Use a lookbehind containing a closing square brace in the first capture group to differentiate between the first match of a continuous series versus a subsequent member of the same continuous series.

The first capture group ($m[1]) will be null or an empty string

The second capture group ($m[2]) will be the glue characters between continuous links.
The third capture group ($m[3]) will be the link's targeted text.

Every time you encounter a first-in-group link (null lookbehind), increment the style counter.

Code: (Demo)

$styleCounter = 0;
echo preg_replace_callback(
         '/((?<=]))? (,? ?)a\[([^][]*)]/',
         function ($m) use(&$styleCounter) {
             if ($m[1] === null) {
                   $styleCounter;
             }
             return "{$m[2]}<a href=\"{$m[3]}\" class=\"style{$styleCounter}\">{$m[3]}</a>";
         },
         $string,
         -1,
         $count,
         PREG_UNMATCHED_AS_NULL
     );

The pattern:

/            #starting delimiter
((?<=]))?    #greedily, optionally match the zero-width position where the previous character was a literal "]" as capture group 1 
(,? ?)       #match an optional comma followed by an optional space as capture group 2
a\[          #match a literal "a" then "[" 
([^][]*)     #match zero or more non-square-brace characters as capture group 3
]            #match a literal "]"
/            #ending pattern delimiter
  • Related