** NOTE: I have already researched this question heavily on Stack Overflow and have not found a solution! I am unable to apply the other answers to my problem, so I need some help. **
The challenge: I want to get an email address from a string but am having trouble targeting the email address only with Regex.
The email address I want from the HTML is:
The HTML is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\r\n<html xmlns="http://www.w3.org/1999/xhtml">\r\n<html>\r\n<head></head>\r\n<body>\r\n<a name="top"></a>Back to Category Index</a></p>\r\n<p>-----------------------------------<br/></p>\r\n\r\n67)<a name="e1h1" id="e1h1"></a> Summary: Solar Eclipse 2024 Travel\r\n<br/><br/>\r\n<p>Name: laure gem wilson\r\nRoadtrippers\r\n</p>Category: Travel\r\n<br/><br/>\r\nEmail: <a href="mailto:[email protected]">[email protected]</a>\r\n<br/><br/>\r\nOutlet: Roadtrip<br/><br/>\r\nDeadline: 7:00 PM EST - 8 July\r\n<br/><br/>\r\n<p>\r\nQuery: \r\n<br/><br/>\r\nHi, I am on assignment to write a feature about planning a road<br/>trip to experience the Solar Eclipse 2024, including path of<br/>totality, advice about viewing, and recommendations for when and<br/>where to book accommodations, thanks!<br/>\r\n</p>\r\n<p>\r\nRequirements: \r\n<br /><br />\r\nMust be domestic USA<br/>\r\n</p>\r\n<p><a href="#top">Back to Top</a> <a href="#Travel">Back to Category Index</a></p>\r\n<p>-----------------------------------<br/>
My Python code is:
Query_Email = re.findall(r'Email:. ', msg_content[index_counter])
This returns:
<a href="mailto:[email protected]">[email protected]</a>
Authority Magazine<br/><br/>
CodePudding user response:
You can just get the email within the mailto:
part with a lazy catch up to the first ">
:
mailto:(.*?)">
https://regex101.com/r/Xk4Ywk/1
This should capture the email within the group.
CodePudding user response:
If you want just extract email address from any text, email regex is one of the most popular regexes and such regex is easy to find, just google 'email regex' and you'd get your answer. I used first search result and slightly modified (i have put \b
- word boudnaries instead of ^
and $
- text boundaries):
\b[a-zA-Z0-9.! #$%&'* \/=? ^_`{|}~-] @[a-zA-Z0-9-] (?:\.[a-zA-Z0-9-] )*\b
BUT
if you're trying to extract information from HTML,DO NOT USE REGEX, becuase :)