Home > Software design >  How to extract the date from a paragraph
How to extract the date from a paragraph

Time:12-25

I have large sentence as shown below,

how are you

On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:


-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy.

I want the date and time specified in the sentence (Tue, Dec 21, 2021 at 1:51 PM). How to extract that from the sentence?

CodePudding user response:

The way to go here is to use regular expressions but for simplicity and if the format of the text is always the same, you can get the date string by looking for the line the looks like this On SOME DATE <Someone<someone's email address>> wrote:. Here is an example implementation:

email = """how are you

On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:


-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy."""

for line in email.splitlines():
    if line.startswith("On ") and line.endswith(" wrote:"):
        date_string = line[3 : line.index(" <")]
        print(f"Found the date: {date_string!r}")
        break
else:
    print("Could not find the date.")

CodePudding user response:

Very dirty:

string = """how are you \r\n\r\nOn Tue, Dec 21, 2021 at 1:51 PM 
<abchttp://localhost> wrote:\r\n\r\n\r\n--------------------------------- 
----------------------------\r\nNOTE: Please do not remove email address 
from the"To" line of this email when replying.This address is used to 
capture the email and report it.Please do not remove or change the 
subject line of this email.The subject line of this email contains 
information to refer this correspondence back to the originating 
discrepancy.\r\n"""

string = string.split("\r\n\r\n")
date = ' '.join(string[1].split(' ')[:8])
print(date)
  • Related