I am new to regex, so any help is really appreciated.
I have an expression to identify a URL :
(http[^'\"] )
Unfortunately on some URLs, I get additional square brackets at the end For instance "http://example.com]]"
As the result want to receive "http://example.com"
How do I get rid of those brackets with the help of the regex I wrote above?
CodePudding user response:
What you actually have is called a negated character class, so just add characters that should not be matched. In addition, there's not really a need for a capturing group. That said, you could use
http[^'"\]\[]
# ^^^^
Note that this will exclude square brackets anywhere in your possible url not just at the end. See a demo on regex101.com.
CodePudding user response:
Stop the match between a word and nonword character:
(http[^'"] )\b
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
[^'"] any character except: ''', '"' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char