Home > Software design >  Regex to convert text URLs in Markdown to Links
Regex to convert text URLs in Markdown to Links

Time:09-22

I'm trying to convert text Links (with a FQDN i.e. no relative links) in markdown text to Markdown links. It is working fine except when the source markdown has already converted the text to links. For example this is the source text:

Login in to My site [https://example.com/](https://example.com/) and select Something > Select below details further.
(https://example.com/abc/1.html)

Also have a look at https://example.com/abc/1.html

My regex: /(?<!\]\()(\b(https?|ftp):\/\/[-A-Z0-9 &@#\/%?=~_|!:,.;]*[-A-Z0-9 &@#\/%=~_|])/gim.

Expected: match only the second and third link. Current outcome: matches 3 URLs.

I tried adding a negative lookahead at the end, similar to the negative lookbehind at the beginning but that just omits the last character of the URL which is a bummer!

I'm using this in NodeJS.

Here's a link to the enter image description here

Where:

  • (?<!\]\() - Lookbehind assertion to ensure the this is not the y in [x](y)
  • ( - Capture URL part
    • (?:https?|ftp):\/\/ - Match the http/ftp part of the URL
    • [^\s\]\)]* - Match the remaining part of the URL.
  • ) - End of capturing of URL
  • (?: - Non-capturing group
    • [\s\]\)] - Match either a space character, closing bracket, or closing parenthesis. The reason we need to match the closing bracket/parenthesis is to allow URLs in the format e.g. (Check https://google.com) or [Check https://google.com]
    • (?!\() - Lookahead assertion to ensure the this is not the x in [x](y)
    • | - Or
    • $ - End of String
  • ) - End of non-capturing group

CodePudding user response:

You can use a pattern to match what you do not want, and capture what you do want in group 1.

You can make use of the callback function of replace in the replacement.

You can check id group 1 exists. If it does, replace with you custom replacement. If it does not exist, replace with the full match

\[(?:https?|ftp):\/\/[^\]\[] \]\([^()]*\)|((?:https?|ftp):\/\/\S )

In parts the pattern matches:

  • \[ Match[
  • (?:https?|ftp):\/\/ Match one of the protocols and ://
  • [^\]\[] Match 1 times any char except [ and ]
  • \] Match ]
  • \([^()]*\) Match from ( till )
  • | Or
  • ((?:https?|ftp):\/\/\S ) Capture in group 1 a url like format

Regex demo

To not match parenthesis in the url:

\[(?:https?|ftp):\/\/[^\]\[] \]\([^()]*\)|((?:https?|ftp):\/\/[^()\s] )

Regex demo

Or specifically capture a url between parenthesis:

\[(?:https?|ftp):\/\/[^\]\[] \]\([^()]*\)|\(((?:https?|ftp):\/\/\S )\)|((?:https?|ftp):\/\/[^()\s] )

Regex demo

  • Related