Home > Software design >  Getting links from markdown file even if link is in the link text
Getting links from markdown file even if link is in the link text

Time:11-15

I've got a markdown file with some links in it. I try to grab all these links with their corresponding link text. It works fine with simple links but I can't figure out how to match a link with an image link.

If I have an image link like [![alt text](https://example.com/image.svg)](https://other-example.com), I'd like to grab both links and both link texts.

I came up with two regexes:

  • /\[([^\]!] )]\((https:\/\/[^\)] )\)/gi
  • /\[([^\[!] )](\(https:\/\/[^\)] \))/gi

let str = `# Title with Image [![alt text](https://example.com/image.svg)](https://other-example.com)

## Links

- [First](https://somesite.com/path/to-page) - voluptates explicabo debitis aspernatur dolor, qui dolores.
- [Second](https://example.io/this/is/page) - molestiae animi eius nisi quam quae quisquam beatae reiciendis.`

let regex1 = /\[([^\]!] )]\((https:\/\/[^\)] )\)/gi
let regex2 = /\[([^\[!] )](\(https:\/\/[^\)] \))/gi
let links1 = [...str.matchAll(regex1)].map((m) => ({ text: m[1], link: m[2] }))
let links2 = [...str.matchAll(regex2)].map((m) => ({ text: m[1], link: m[2] }))

console.log(links1)
console.log(links2)

The expected result would be (order doesn't matter):

[
  {
    "text": "![alt text](https://example.com/image.svg)",
    "link": "https://other-example.com"
  },
  {
    "text": "alt text",
    "link": "https://example.com/image.svg"
  },
  {
    "text": "First",
    "link": "https://somesite.com/path/to-page"
  },
  {
    "text": "Second",
    "link": "https://example.io/this/is/page"
  }
]

regex101 link

CodePudding user response:

([^\]!] ) blocks you from matching the ![alt text](https://example.com/image.svg), so I replaced it with (!\[. ?\]\(. ?\)|. ?) that first looks for ![alt text](https://example.com/image.svg) and then alt text as a text wrapped in [] (square braces).

/(?=\[(!\[. ?\]\(. ?\)|. ?)]\((https:\/\/[^\)] )\))/gi

Note, for cross-matching, you should wrap the pattern into positive lookahead. Also, see the demo.

let str = `# Title with Image [![alt text](https://example.com/image.svg)](https://other-example.com)

## Links

- [First](https://somesite.com/path/to-page) - voluptates explicabo debitis aspernatur dolor, qui dolores.
- [Second](https://example.io/this/is/page) - molestiae animi eius nisi quam quae quisquam beatae reiciendis.`

let regex = /(?=\[(!\[. ?\]\(. ?\)|. ?)]\((https:\/\/[^\)] )\))/gi

let links = [...str.matchAll(regex)].map((m) => ({ text: m[1], link: m[2] }))

console.log(links)

  • Related