I am trying to match some part from a text which is something like:
This is dummy text, and file added is [^file.pdf] and this is my format \\[^myfile.png]
First [^...]
is what I want to match (it's a file link actually), and if user types this format manually in the input, it will be escaped as you can see second \\[^...]
. So I want to get the text between all the [^...]
's and don't match if it has \
with the bracket.
I have tried [^\\]\[.*\]$
, but it is not working. Also tried (?!.*?[\\])\[.*\]
, this one matches the brackets but doesn't restrict the bracket with slash.
I am using PYTHON (3.9.*) and please note I am getting this text format from the API, so changing the text format is not the solution.
CodePudding user response:
You used a negated character class, [^\\]
, that requires a char other than \
in front of your expected matches, this excluded matches at the start of string. Another issue is using a greedy dot, .*
. It matches any zero or more chars other than line break chars as many as possible, so you matched from the first [
till the last ]
. You did not specify that there must be ^
after the [
, that also caused matching string with no ^
after [
.
You can use
(?<!\\)(\[\^[^][]*])
See the regex demo. Details:
(?<!\\)
- negative lookbehind that fails the match if there is a\
immediately to the left of the current location\[\^
-[^
substring[^][]*
- a negated character class that matches any zero or more chars other than[
and]
]
- a]
char.