I am trying to get a list of URLs "https://www.crocodiletrading.co.uk/" from an HTM file, I also need to get anything that comes after the main URL for example /blog/name-of-blog etc.
I am using Notepad and Regex to try and accomplish this but I am struggling. I don't really understand Regex.
I've tried something like this: .*?(https\:\/\/www\.[a-zA-Z0-9\.\/\-] )
Can anyone let me know how I can accomplish this?
I'm getting a list of the URLs that have been flagged as broken so I can then use this to set up 301 redirects.
Here is the HTML FILE if anyone wants to take a look.
Thanks in advance.
CodePudding user response:
This function prints all the links that are inside all anchor tags (<a href="link to some page"></a>)
const getAllLinks = () => {
const links = document.querySelectorAll("a");
links.forEach(link => {
console.log(link.href);
})
}
CodePudding user response:
Here is what I ended up doing instead, using good old jQuery to grab the URLs that contained crocodiletrading.co.uk
jQuery( document ).ready( function() {
var arr = [];
i = 0;
jQuery('a[href*="crocodiletrading.co.uk"]').each(function() {
arr[i ] = jQuery(this).attr('href');
});
var list = '<ul class="myList"><li class="ui-menu-item" role="menuitem"><a class="ui-all" tabindex="-1">' arr.join('</a></li><li>') '</li></ul>';
console.log(list);
});