I am trying to do some html scraping with JavaScript, and would like to take the a href
link and replace it into a hyperlink on a Discord embed. I am having trouble with regex, I am finding it very difficult to learn.
I assume I will also need another regex to capture it all so I can replace it with my desired target?
This is an example raw html that I have:
An **example**, also known as a <a href="https://www.example.com/example type">example type</a>
to make this readable within a Discord embed, I am looking for a desired output of:
An **example**, also known as a [**example type**](https://www.example.com/example type)
I have tried extracting the URL via regex, which I can match however, I am having issues with extracting the link and the (I think its called target? The 'example type' in the example link text) and then replacing the string with my desired output. I have the following: (https://regexr.com/73574)
/href="[^"] /g
This matches href="https://www.example.com/example type
, and feels like a very early step, it includes 'href' in the match, and it does not capture the target.
EDIT: I apologise, I did not think about additional checks, what if the string has multiple links? and text after them, for example:
An **example**, also known as a <a href="https://www.example.com/example type">example type</a> is the first example, and now I have <a href="https://www.example.com/second">second</a> example
with a desired output of:
An **example**, also known as a [**example type**](https://www.example.com/example type) is the first example, and now I have [**second**](https://www.example.com/second) example
CodePudding user response:
Try this: (?<=href=")[^"]*
By using a lookbehind, you can now verify that the text behind is equal to href="
without capturing it
Demo: https://regex101.com/r/2qMnPt/1
CodePudding user response:
You can use regular expression groups to capture things that interest you. My regular expression here might be far from perfect but I don't think that's important here - it shows you a way and you can always improve it if needed.
Things you have to do:
- prepare regex that captures groups that you need (anchor tag, anchor text, anchor url),
- remove the anchor tag completely from the text
- inject anchor text and anchor href into the final string
Here's a quick code example of that:
const anchorRegex = /(<a\shref="([^"] )">(. ?)<\/a>)/i;
const textToBeParsed = `An **example**, also known as a <a href="https://www.example.com/example type">example type</a>`;
const parseText = (text) => {
const matches = anchorRegex.exec(textToBeParsed);
if (!matches) {
console.warn("Something went wrong...");
return;
}
const [, fullAnchorTag, anchorUrl, anchorText] = matches;
const textWithoutAnchorTag = text.replace(fullAnchorTag, '');
return `${textWithoutAnchorTag}[**${anchorText}**](${anchorUrl})`;
};
console.log(parseText(textToBeParsed));