Home > Blockchain >  regex not capturing newline
regex not capturing newline

Time:04-28

I am trying to parse log files using regex. logs looks like that:

2022-04-01 00:00:00.0000|DEBUG|LOREM:LOREM|IPSUM:LOREM:LOREMIPSUM Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel placerat sapien. Suspendisse interdum est nulla, ac interdum sem pellentesque vel. Ut condimentum nisl ipsum (Failed:1/Total:5) [10.0000 ms].
2022-04-01 00:00:00.0000|DEBUG|LOREM:IPSUM|lorem ipsum \\SOME-PATH[Lorem Ipsum] (ID:000000-0000-0000-0000). Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel placerat sapien. Suspendisse interdum est nulla, ac interdum sem pellentesque vel. //line return here
Ut condimentum nisl ipsum.
2022-04-01 00:00:00.0000|DEBUG|LOREM:IPSUM|lorem ipsum \\SOME-PATH[Lorem Ipsum] (ID:000000-0000-0000-0000). Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel placerat sapien. Suspendisse interdum est nulla, ac interdum sem pellentesque vel. //line return here
Ut condimentum nisl ipsum.

Here is what I have tried (live version on regex 101 https://regex101.com/r/RoDU5L/1)

^(?<timestamp>^[\d-] \s[\d:.] )\|DEBUG\|(.*?)?\r?$|.*?(?<path>\\.*\]\s)(?<description>.*) $ /gm

The problem is that it is not taking the last line "Ut condimentum nisl ipsum."

Thanks for your help

CodePudding user response:

You can use

^(?<timestamp>^[\d-] \s[\d:.] )\|DEBUG\|(.*(?:\r?\n(?![\d-] \s[\d:.] \|).*)*)|.*?(?<path>\\.*\]\s)(?<description>.*) $

See the regex demo.

The .*(?:\r?\n(?![\d-] \s[\d:.] \|).*)* part now matches

  • .* - any zero or more chars other than line break chars, as many as possible
  • (?:\r?\n(?![\d-] \s[\d:.] \|).*)* - zero or more occurrences of
    • \r?\n(?![\d-] \s[\d:.] \|) - CRLF or LF line ending now immediately followed with a datetime-like pattern and a | right after
    • .* - any zero or more chars other than line break chars, as many as possible.
  • Related