Home > Software engineering >  Regex for matching [^whatever-here] and restricting \[^...]
Regex for matching [^whatever-here] and restricting \[^...]

Time:10-01

I am trying to match some part from a text which is something like:

This is dummy text, and file added is [^file.pdf] and this is my format \\[^myfile.png]

First [^...] is what I want to match (it's a file link actually), and if user types this format manually in the input, it will be escaped as you can see second \\[^...]. So I want to get the text between all the [^...]'s and don't match if it has \ with the bracket.

I have tried [^\\]\[.*\]$, but it is not working. Also tried (?!.*?[\\])\[.*\], this one matches the brackets but doesn't restrict the bracket with slash.

I am using PYTHON (3.9.*) and please note I am getting this text format from the API, so changing the text format is not the solution.

CodePudding user response:

You used a negated character class, [^\\], that requires a char other than \ in front of your expected matches, this excluded matches at the start of string. Another issue is using a greedy dot, .*. It matches any zero or more chars other than line break chars as many as possible, so you matched from the first [ till the last ]. You did not specify that there must be ^ after the [, that also caused matching string with no ^ after [.

You can use

(?<!\\)(\[\^[^][]*])

See the regex demo. Details:

  • (?<!\\) - negative lookbehind that fails the match if there is a \ immediately to the left of the current location
  • \[\^ - [^ substring
  • [^][]* - a negated character class that matches any zero or more chars other than [ and ]
  • ] - a ] char.
  • Related