I have this sample logs data in a list
data = ["[2022-08-15 17:42:32,436: INFO/MainProces] lorqw q addadasdasdasdad",
"2022-10-24T13:29:50.579Z dasdadasdasdadadadadaddada",
"asdadadad adasdas3453 454234 fsdf53",
"Mon, 24 Oct 2022 13:29:48 GMT express:router expressInit : /health",
'time="2022-10-24T13:29:12Z" level=error msg="checking config status failed: sdadasd"',
"2022/10/24 13:29:15 [error] 234 ssdfsd 435345"]
what I tried so far to print the item if the date is exist along with it's index
for index, elem in enumerate(data):
if ']' and '[' in elem:
print(f'Date found at index: {index}')
current output:
Date found at index: 0
Date found at index: 5
Expected Output:
Date found at index: 0
Date found at index: 1
Date found at index: 3
Date found at index: 4
Date found at index: 5
CodePudding user response:
Since the only real repeating part of the date is the time, I'd chase the time with a regex:
for index, entry in enumerate(data):
if re.search(r'(\s|T)[0-9]{2}\:[0-9]{2}\:[0-9]{2}([\.\,][0-9] )*', entry):
print(f"Found date in {index}")
CodePudding user response:
Look at this module(im not owner) - https://github.com/ishirav/date-detector There is links to similar projects also.
CodePudding user response:
You can use this collection of regular expressions
import re
data = [
"[2022-08-15 17:42:32,436: INFO/MainProces] lorqw q addadasdasdasdad",
"2022-10-24T13:29:50.579Z dasdadasdasdadadadadaddada",
"asdadadad adasdas3453 454234 fsdf53",
"Mon, 24 Oct 2022 13:29:48 GMT express:router expressInit : /health",
'time="2022-10-24T13:29:12Z" level=error msg="checking config status failed: sdadasd"',
"2022/10/24 13:29:15 [error] 234 ssdfsd 435345"
]
exps = [
r"(\d{4}[-/]\d{2}[-/]\d{2})", # year / month / day
r"(\d{2}:\d{2}:\d{2})", # hour : min : sec
r"((?=.*(jan|feb|mar|apr|may|jun|jul|aug|sept|oct|nov|dec))(?=.*(mon|tues|wed|thurs|fri)))",
]
p = re.compile('|'.join(exps))
for d in data:
if bool(p.search(d.lower())):
print(d)