I have some input data like:
<li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1">title url</a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges">Title url Charges</a></b> test data.</li>
<li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1"><h1><b>title url</b></h1></a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges"><img alt="About this image" src="//imgs.wikimedia.org/static-image/desc20.png" style="border: none;" /></a></b> test data.</li>
I need to filter URL-tags and innter HTML Value. That means need all data within <a href
to </a>
tag using regular expressions (doesn't matter whatever it contains within the URL
tag).
So, the expected output as follows:
Output of first input:
<a href="/wiki/wiki_url" title="title1">title url</a>
<a href="/wiki/wiki_url_charges" title="Title2 charges">Title url Charges</a>
Output of second input:
<a href="/wiki/wiki_url" title="title1"><h1><b>title url</b></h1></a>
<a href="/wiki/wiki_url_charges" title="Title2 charges"><img alt="About this image" src="//imgs.wikimedia.org/static-image/desc20.png" style="border: none;" /></a>
Can anyone help me how to resolve this issue using Regular Expression
?
CodePudding user response:
You can use this regex:
<a[^>]*>(.|\n\t\r)*?<\/a>
This regex will return all of a
tags with or without enter.
CodePudding user response:
If you only want to use regex here, it could look like the following:
const regex = /<a[^[<>]*>[^<>]*<\/a>/gm
(tested on this website)
But there are ways in js to access specific dom elements by their tag name for instance, in your case:
const aTags = Array.from(document.getElementsByTagName("a"))
CodePudding user response:
Avoid regex for this task. Use outerHTML
instead.
Edit: you can parse the string to a HTML document and still use outerHTML
to access the link information.
const str = '<li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1">title url</a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges">Title url Charges</a></b> test data.</li><li style="-moz-float-edge: content-box">Test text <a href="/wiki/wiki_url" title="title1"><h1><b>title url</b></h1></a> <i>(pictured)</i> is <b><a href="/wiki/wiki_url_charges" title="Title2 charges"><img alt="About this image" src="//imgs.wikimedia.org/static-image/desc20.png" style="border: none;" /></a></b> test data.</li>';
const parser = new DOMParser();
const html = parser.parseFromString(str, 'text/html');
const links = html.querySelectorAll('li a');
links.forEach(link => console.log(link.outerHTML));