I am trying to parse log files using regex. logs looks like that:
2022-04-01 00:00:00.0000|DEBUG|LOREM:LOREM|IPSUM:LOREM:LOREMIPSUM Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel placerat sapien. Suspendisse interdum est nulla, ac interdum sem pellentesque vel. Ut condimentum nisl ipsum (Failed:1/Total:5) [10.0000 ms].
2022-04-01 00:00:00.0000|DEBUG|LOREM:IPSUM|lorem ipsum \\SOME-PATH[Lorem Ipsum] (ID:000000-0000-0000-0000). Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel placerat sapien. Suspendisse interdum est nulla, ac interdum sem pellentesque vel. //line return here
Ut condimentum nisl ipsum.
2022-04-01 00:00:00.0000|DEBUG|LOREM:IPSUM|lorem ipsum \\SOME-PATH[Lorem Ipsum] (ID:000000-0000-0000-0000). Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel placerat sapien. Suspendisse interdum est nulla, ac interdum sem pellentesque vel. //line return here
Ut condimentum nisl ipsum.
Here is what I have tried (live version on regex 101 https://regex101.com/r/RoDU5L/1)
^(?<timestamp>^[\d-] \s[\d:.] )\|DEBUG\|(.*?)?\r?$|.*?(?<path>\\.*\]\s)(?<description>.*) $ /gm
The problem is that it is not taking the last line "Ut condimentum nisl ipsum."
Thanks for your help
CodePudding user response:
You can use
^(?<timestamp>^[\d-] \s[\d:.] )\|DEBUG\|(.*(?:\r?\n(?![\d-] \s[\d:.] \|).*)*)|.*?(?<path>\\.*\]\s)(?<description>.*) $
See the regex demo.
The .*(?:\r?\n(?![\d-] \s[\d:.] \|).*)*
part now matches
.*
- any zero or more chars other than line break chars, as many as possible(?:\r?\n(?![\d-] \s[\d:.] \|).*)*
- zero or more occurrences of\r?\n(?![\d-] \s[\d:.] \|)
- CRLF or LF line ending now immediately followed with a datetime-like pattern and a|
right after.*
- any zero or more chars other than line break chars, as many as possible.