My Regex pattern:
^(?<timestamp>[a-zA-Z]{3} [0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\.[0-9]{1,6}) (?<levelname>[A-Z]) ?(ERR:)? ?(?<source>\[.*\])? (?<message>.*)
Test_String_1 = "Oct 25 14:24:29.700799 I [System] Connected"
Output as per my regex pattern:
Match groups:
timestamp Oct 25 14:24:29.700799
levelname I
source [System]
message Connected
Test_String_2 = "Oct 25 14:24:30.315344 E ERR: [[Signal]] Valid Shared Mem!"
Output as per my regex pattern:
Match groups:
timestamp Oct 25 14:24:30.315344
levelname E
source [[Signal]]
message Valid Shared Mem!
However I am expecting the below results for Test_String_1 and Test_String_2:
Test_String_1:
Match groups:
timestamp Oct 25 14:24:29.700799
levelname I
source System
message Connected
Test_String_2:
Match groups:
timestamp Oct 25 14:24:30.315344
levelname E
source Signal
message Valid Shared Mem!
What changes should I made in my regex pattern to get the expected result. I'm using https://rubular.com/ for regex testing.
[Edit]: Test_String_3 = "Oct 25 14:24:29.653900 D Connection refused"
Expected output:
Match groups:
timestamp Oct 25 14:24:29.653900
levelname D
source
message Connection refused
CodePudding user response:
I think this is what you want
^(?<timestamp>[a-zA-Z]{3} [0-9]{1,2} [0-9]{1,2}\:[0-9]{1,2}\:[0-9]{1,2}\.[0-9]{1,6}) (?<levelname>[A-Z]) ?(ERR:)? ?(\[*(?<source>\w*)\]*) (?<message>.*)
brackets should wrap the source
field.
CodePudding user response:
You can match the following regular expression.
^(?P<timestamp>[JFMASOND][a-z]{2} [0123]\d [012]\d(?::[0-5]\d){2}\.\d{6}\b) (?P<levelname>[A-Z]) (?:[A-Z] : )?(?:\[ (?P<source>[A-Za-z] )\] )? *(?P<message>. )
Notice that I've made the capture group source
optional.
Depending on requirements some adjustments may need to be made. I assumed, for example, that the source
capture group would contain a single word and if there were non-spaces between the levelname
and source
(or message) it would be comprised of one or more capital letters followed by a colon, as in the second example ('ERR:'
). I've also made assumptions about how rigorous the timestamp
format must be specified and which capture groups should be made optional. These were of course just guesses about the specification as they were not spelled out in the question.
The regular expression can be broken down as follows. Note that I have put individual spaces in character classes ([ ]
) merely to make them visible to the reader. I've tested this with Python (for which named character classes are written (?P<name>....)
, but it would work in Ruby as well.
^ # match beginning of string
(?P<timestamp> # begin 'timestamp' capture group
[JFMASOND] # match a cap letter in the char class
[a-z]{2} # match two lowercase letters
[ ] # match a space
[0123]\d # match a digit in the char class then any digit
[ ] # match a space
[012]\d # match a digit in the char class then any digit
(?: # begin a non-capture group
: # match a colon
[0-5]\d # match a digit in the char class then any digit
){2} # end non-capture group and execute it twice
\. # match a period
\d{6} # match 6 digits
\b # match a word boundary
) # end timestamp capture group
(?P<levelname> # begin 'levelname' capture group
[A-Z])[ ] # match a capital letter then >= 1 spaces
) # end 'levelname' capture group
(?:[A-Z] :[ ] )? # optionally match >= 1 capital letters
# then >= 1 spaces
(?: # begin non-capture group
\[ # match one or more left brackets
(?P<source> # begin capture group 'source'
[A-Za-z] # match >= 1 chars in char class
) # end capture group 'source'
\] # match one or more right brackets
)? # end non-capture group and make optional
[ ]* # match >= 0 spaces
(?P<message>. ) # match rest of line and save to capture
# group 'message'