Home > Blockchain >  Regex to replace forward slash if it isn't within an anchor/link tag
Regex to replace forward slash if it isn't within an anchor/link tag

Time:09-27

Given a string containing sequences such as some/text <a href="/some/text">some/text</a> I need to replace just slashes (insert after the slash) that are not within a link.

Note that there would be multple instances of the above in the string, it is just the forward slash to target.

As a starting point I tried /(\w)(\/{1})(\w)/ with replacement $1$2INSERT$3 but this replaces slashes within the link. I'm not sure how to make it not within <a*> and </a>.

Desired outcome:

some/INSERTtext <a href="/some/text">some/text</a>

CodePudding user response:

Maybe it is a little bit convoluted but you my try this regex if you're using PHP. It also works for nested/paired tags.

  • Regex
(?:(<(\S )[^<>]*>(?:[^<>]|(?1))*<\/\2>)|<[^<>]*>)(*SKIP)(*F)|\/
  • Substitution
/INSERT

The idea is to match all tags first and then ignore them, then you can match / safely.

Check the test cases.

CodePudding user response:

Dealing with HTML with regexp is hard (actually impossible, but let's assume that the limited subset you want is possible). You need:

  • ungreedy matching
  • lookahead and lookbehind, so you skip forwardslashes that follow opening brace, and didn't have closing one yet

Look at this one for example

But even better approach would be not to use RegExp for this task, but load it in DOMDOcument and actually run the replace only in the text nodes, traverse the tree, replace the forward slash in the text nodes, and get the resulting HTML back

  • Related