Home > Software design >  multiline regex with lookahead
multiline regex with lookahead

Time:09-16

I`m currently trying to read a log file with regex. My logs begin with a timestamp followed by a random multiline message which can include multiple new lines, returns and all types of character.

The regex should capture everything starting with the timestamp, the actual log message until we reach a new timestamp. At the moment I do this by using a positive lookahead till the next timestamp.

On the webside regex101 the code works more or less. In our security event manager the same regex doesn't work. I need to save every event with the timestamp being the first capturing group and the log message being the second capturing group.

(\w{3}\s{1}\w{3}\s{1}\d{2}\s{1}\d{2}\:\d{2}\:\d{2}\s{1}\d{4})((\r||.|\n)*)(?=(\w{3}\s{1}\w{3}\s{1}\d{2}\s{1}\d{2}\:\d{2}\:\d{2}\s{1}\d{4}))

Example log:

Tue Sep 14 08:57:47 2021 Thread 1 advanced to log sequence 186 (LGWR switch) Current log# 2 seq# 186 mem# 0: D:\ORADB\DV1\REDO02A.LOG Current log# 2 seq# 186 mem# 1: H:\ORADB\DV1\REDO02B.LOG Tue Sep 14 09:07:40 2021 Thread 1 advanced to log sequence 187 (LGWR switch) Current log# 3 seq# 187 mem# 0: D:\ORADB\DV1\REDO03A.LOG Current log# 3 seq# 187 mem# 1: H:\ORADB\DV1\REDO03B.LOG Tue Sep 14 09:22:09 2021 Thread 1 advanced to log sequence 188 (LGWR switch) Current log# 4 seq# 188 mem# 0: D:\ORADB\DV1\REDO04A.LOG Current log# 4 seq# 188 mem# 1: H:\ORADB\DV1\REDO04B.LOG

enter image description here

Where:

  • ( - Start of 1st capturing group
    • \w{3}\s\w{3}\s\d{2}\s - Match Tue Sep 14
    • \d{2}:\d{2}:\d{2}\s\d{4} - Match 08:57:47 2021
  • ) - End of 1st capturing group
  • ( - Start of 2nd capturing group
    • [\s\S]*? - Match any characters including new lines. The match will be in a non-greeedy way (thus the least possible match).
  • ) - End of 2nd capturing group
  • (?= - Start of look ahead assertion
    • \w{3}\s\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s\d{4} - The next part must either be the timestamp (this is the same pattern as the matching of the timestamp in the first part of this whole regex).
    • | - Or
    • \Z - Or the next part must be the end of string
  • ) - End of look ahead assertion. Note that since the pattern before this is non-greedy, this will always be the closest timestamp, thus is always the next timestamp.
  • Related