I'm trying to convert text Links (with a FQDN i.e. no relative links) in markdown text to Markdown links. It is working fine except when the source markdown has already converted the text to links. For example this is the source text:
Login in to My site [https://example.com/](https://example.com/) and select Something > Select below details further.
(https://example.com/abc/1.html)
Also have a look at https://example.com/abc/1.html
My regex: /(?<!\]\()(\b(https?|ftp):\/\/[-A-Z0-9 &@#\/%?=~_|!:,.;]*[-A-Z0-9 &@#\/%=~_|])/gim
.
Expected: match only the second and third link. Current outcome: matches 3 URLs.
I tried adding a negative lookahead at the end, similar to the negative lookbehind at the beginning but that just omits the last character of the URL which is a bummer!
I'm using this in NodeJS.
Where:
(?<!\]\()
- Lookbehind assertion to ensure the this is not they
in[x](y)
(
- Capture URL part(?:https?|ftp):\/\/
- Match the http/ftp part of the URL[^\s\]\)]*
- Match the remaining part of the URL.
)
- End of capturing of URL(?:
- Non-capturing group[\s\]\)]
- Match either a space character, closing bracket, or closing parenthesis. The reason we need to match the closing bracket/parenthesis is to allow URLs in the format e.g.(Check https://google.com)
or[Check https://google.com]
(?!\()
- Lookahead assertion to ensure the this is not thex
in[x](y)
|
- Or$
- End of String
)
- End of non-capturing group
CodePudding user response:
You can use a pattern to match what you do not want, and capture what you do want in group 1.
You can make use of the callback function of replace in the replacement.
You can check id group 1 exists. If it does, replace with you custom replacement. If it does not exist, replace with the full match
\[(?:https?|ftp):\/\/[^\]\[] \]\([^()]*\)|((?:https?|ftp):\/\/\S )
In parts the pattern matches:
\[
Match[
(?:https?|ftp):\/\/
Match one of the protocols and://
[^\]\[]
Match 1 times any char except[
and]
\]
Match]
\([^()]*\)
Match from(
till)
|
Or((?:https?|ftp):\/\/\S )
Capture in group 1 a url like format
To not match parenthesis in the url:
\[(?:https?|ftp):\/\/[^\]\[] \]\([^()]*\)|((?:https?|ftp):\/\/[^()\s] )
Or specifically capture a url between parenthesis:
\[(?:https?|ftp):\/\/[^\]\[] \]\([^()]*\)|\(((?:https?|ftp):\/\/\S )\)|((?:https?|ftp):\/\/[^()\s] )