Home > database >  I am confused as to how I can go about finding all substrings that match this pattern: Mon Aug 20 16
I am confused as to how I can go about finding all substrings that match this pattern: Mon Aug 20 16

Time:03-11

I am not sure how to go about finding all of the substrings that match this pattern. I believe I can use regex but I am not sure. Currently, I have a string that has some other stuff and this date in it.

Example: "Expiration Date: Mon Aug 20 16:07:24 2029 word other gibberish word Expiration Date: Mon Aug 20 16:08:16 2029 word gibberish word"

I do not want to find exactly this string. I want to find all date-like values.

CodePudding user response:

If you want to find all date-like values in the same format as your example, here is one way to do so:

(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{2} \d{2}:\d{2}:\d{2} \d{4}

In python:

import re

data = "Expiration Date: Mon Aug 20 16:07:24 2029 word other gibberish word Expiration Date: Mon Aug 20 16:08:16 2029 word gibberish word"

print(re.findall(r'(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) '
                 r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) '
                 r'\d{2} \d{2}:\d{2}:\d{2} \d{4}',
                 data))
# ['Mon Aug 20 16:07:24 2029', 'Mon Aug 20 16:08:16 2029']

CodePudding user response:

If you want to match every date-like string between "Expiration Date: " and " word", you could use this regex:

(?<=Expiration Date: ).*?(?= word)

Of course, this assumes that " word" is always right after the date you want to match. It uses a "Positive Lookbehind" and a "Positive Lookahead" so that matches are caught as separate matches instead of just a big match.


Resulting code:

>>> import re
>>> re.findall("(?<=Expiration Date: ).*?(?= word)", INPUT_STRING)
["Mon Aug 20 16:07:24 2029", "Mon Aug 20 16:08:16 2029"]
  • Related