Home > database >  Identifying different date patterns in python list
Identifying different date patterns in python list

Time:10-26

I have this sample logs data in a list

data = ["[2022-08-15 17:42:32,436: INFO/MainProces] lorqw q addadasdasdasdad",
"2022-10-24T13:29:50.579Z dasdadasdasdadadadadaddada",
"asdadadad adasdas3453 454234 fsdf53",
"Mon, 24 Oct 2022 13:29:48 GMT express:router expressInit : /health",
'time="2022-10-24T13:29:12Z" level=error msg="checking config status failed: sdadasd"',
"2022/10/24 13:29:15 [error] 234 ssdfsd 435345"]

what I tried so far to print the item if the date is exist along with it's index

for index, elem in enumerate(data):
    if ']' and '[' in elem:
        print(f'Date found at index: {index}')

current output:

Date found at index: 0
Date found at index: 5

Expected Output:

Date found at index: 0
Date found at index: 1
Date found at index: 3
Date found at index: 4
Date found at index: 5

CodePudding user response:

Since the only real repeating part of the date is the time, I'd chase the time with a regex:

for index, entry in enumerate(data):
  if re.search(r'(\s|T)[0-9]{2}\:[0-9]{2}\:[0-9]{2}([\.\,][0-9] )*', entry):
    print(f"Found date in {index}")

CodePudding user response:

Look at this module(im not owner) - https://github.com/ishirav/date-detector There is links to similar projects also.

CodePudding user response:

You can use this collection of regular expressions

import re

data = [
    "[2022-08-15 17:42:32,436: INFO/MainProces] lorqw q addadasdasdasdad",
    "2022-10-24T13:29:50.579Z dasdadasdasdadadadadaddada",
    "asdadadad adasdas3453 454234 fsdf53",
    "Mon, 24 Oct 2022 13:29:48 GMT express:router expressInit : /health",
    'time="2022-10-24T13:29:12Z" level=error msg="checking config status failed: sdadasd"',
    "2022/10/24 13:29:15 [error] 234 ssdfsd 435345"
]

exps = [
    r"(\d{4}[-/]\d{2}[-/]\d{2})", # year / month / day
    r"(\d{2}:\d{2}:\d{2})",       # hour : min : sec
    r"((?=.*(jan|feb|mar|apr|may|jun|jul|aug|sept|oct|nov|dec))(?=.*(mon|tues|wed|thurs|fri)))",
]

p = re.compile('|'.join(exps))
for d in data:
    if bool(p.search(d.lower())):
        print(d)
  • Related