Home > Enterprise >  I am having a hard time with a regex which includes newlines
I am having a hard time with a regex which includes newlines

Time:10-10

I am new to regex and I almost figured out what I want, but I'm stuck on how to get data when there are newlines and return in my log.

This is my test string:

2022-09-21 05:45:24.8603 Debug Sensor SetState Started
2022-09-21 05:45:26.7529 Info Updater SensorDeploymentLogs no value returned from SensorDeployment
2022-09-21 12:37:47.1286 Error TaskAwaiter RunPeriodic <RegisterPeriodicTask>b__1 failed
System.Threading.Tasks.TaskCanceledException: A task was canceled.
    at async Task<HttpResponseMessage> System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task<HttpResponseMessage> sendTask, HttpRequestMessage request, CancellationTokenSource cts, bool disposeCts)
    at async Task<TResponse> CommunicationWebClient.SendAsync<TResponse>(byte[] requestBytes, int offset, int count)
    at async Task<TResponse> CommunicationWebClient.SendWithRetryAsync<TResponse>(byte[] requestBytes, int offset, int count)
    at async Task<TResponse> CommunicationWebClient.SendAsync<TResponse>(IRequestWithResponse<TResponse> request)
2022-09-21 12:42:53.2810 Info Updater Sensor no value returned from SensorDeployment

This is my regular expression:

^\d{4}\-(0[1-9]|1[012])\-(0[1-9]|[12][0-9]|3[01]) (?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)?([0-5]?\d).\d{4}.*

The saved regular expression is saved here: https://regex101.com/r/mc4mVo/1

What I want is every entry in my log file based on the date/time and everything in between. My problem is that the log file contains newlines and I don't know how to get the data between the dates. I've got all data if it's on the same line, but every line that's between the dates which contains a newline I can't seem to get. I'm also a bit confused on how to get the data between the dates and I would love to get some help on this one.

This is my expected result using regex only:

Result 1: 2022-09-21 05:45:24.8603 Debug Sensor SetState Started
Result 2: 2022-09-21 05:45:26.7529 Info Updater SensorDeploymentLogs no value returned from SensorDeployment
Result 3: 2022-09-21 12:37:47.1286 Error TaskAwaiter RunPeriodic <RegisterPeriodicTask>b__1 failed System.Threading.Tasks.TaskCanceledException: A task was canceled. at async Task<HttpResponseMessage> System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task<HttpResponseMessage> sendTask, HttpRequestMessage request, CancellationTokenSource cts, bool disposeCts) at async Task<TResponse> CommunicationWebClient.SendAsync<TResponse>(byte[] requestBytes, int offset, int count) at async Task<TResponse> CommunicationWebClient.SendWithRetryAsync<TResponse>(byte[] requestBytes, int offset, int count) at async Task<TResponse> CommunicationWebClient.SendAsync<TResponse>(IRequestWithResponse<TResponse> request)
Result 4: 2022-09-21 12:42:53.2810 Info Updater Sensor no value returned from SensorDeployment

Note: Language I'm using is C#.

So, my question is simple: How do I get "every entry" in my log file between the dates (including the dates) as the expected result? If I can get the lines between the lines as a group would be even better (so not a one-liner, but all lines as a group or it will be hard to read).

CodePudding user response:

To get the data between the dates, you need to match these lines, something like:

^(\d{4}-[\d-] )\s (\d[\d:] )\.(\d )\s (. (?:\n(?!\d{4}-). )*)

See this demo at regex101

Here the . matches up to end of the line and (?:\n(?!\d{4}-). )* catches any following lines, that are not starting with YYYY- which is checked by use of a negative lookahead.

If you just want to split your data into four chunks at newlines followed by year: \n(?=\d{4}-)

  • Related