Home > other >  Searching all nested url tags and its inner value
Searching all nested url tags and its inner value

Time:08-07

I have some input data like:

<li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1">title url</a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges">Title url Charges</a></b> test data.</li>

<li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1"><h1><b>title url</b></h1></a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges"><img alt="About this image" src="//imgs.wikimedia.org/static-image/desc20.png" style="border: none;" /></a></b> test data.</li>

I need to filter URL-tags and innter HTML Value. That means need all data within <a href to </a> tag using regular expressions (doesn't matter whatever it contains within the URL tag).

So, the expected output as follows:

Output of first input:

<a href="/wiki/wiki_url" title="title1">title url</a>
<a href="/wiki/wiki_url_charges" title="Title2 charges">Title url Charges</a>

Output of second input:

<a href="/wiki/wiki_url" title="title1"><h1><b>title url</b></h1></a>
<a href="/wiki/wiki_url_charges" title="Title2 charges"><img alt="About this image" src="//imgs.wikimedia.org/static-image/desc20.png" style="border: none;" /></a>

Can anyone help me how to resolve this issue using Regular Expression?

CodePudding user response:

You can use this regex:

<a[^>]*>(.|\n\t\r)*?<\/a>

This regex will return all of a tags with or without enter.

CodePudding user response:

If you only want to use regex here, it could look like the following:

const regex = /<a[^[<>]*>[^<>]*<\/a>/gm

(tested on this website)

But there are ways in js to access specific dom elements by their tag name for instance, in your case:

const aTags = Array.from(document.getElementsByTagName("a"))

CodePudding user response:

Avoid regex for this task. Use outerHTML instead.

Edit: you can parse the string to a HTML document and still use outerHTML to access the link information.

const str = '<li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1">title url</a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges">Title url Charges</a></b> test data.</li><li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1"><h1><b>title url</b></h1></a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges"><img alt="About this image" src="//imgs.wikimedia.org/static-image/desc20.png" style="border: none;" /></a></b> test data.</li>';

const parser = new DOMParser();
const html = parser.parseFromString(str, 'text/html');

const links = html.querySelectorAll('li a');
links.forEach(link => console.log(link.outerHTML));

  • Related