Home > Blockchain >  PHP: Parse log entry with regex into multiple pieces
PHP: Parse log entry with regex into multiple pieces

Time:12-09

I need some help since I'm not that PHP RegEx expert. I have this line of text here, which will always be the same (except the message at the end):

2021-12-08T18:18:38 00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1\r\nMESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.\r\n"

I have 3 functions which should return parts of the log entry:

public function get_log_file_entry_time( string $entry ): string {
    
}

public function get_log_file_entry_level( string $entry ): string {

}

public function get_log_file_entry_message( string $entry ): string {

}

I've first tried using explode with a whitespace as delimiter, which works but not the best way since the log message can be very long in some cases.

I'm not that RegEx expert, but I've found the following combination to match the first two pieces: ([^\s] ) ([A-Z] )

This returns me the timestamp and the level. Now I'm struggling to get the message after the second group - maybe my nesting is not perfect at all. Any advice would make me happy!

Notice

The message will start after the first whitespace after the logging level. For example:

Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1\r\nMESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.\r\n"

CodePudding user response:

You can use 3 capture groups, where the 3rd group contains the rest of the line, followed by all lines that do not start with a date time like pattern.

You can make the pattern a bit more specific for group 1, and to match the rest of the lines that do not start with the group 1 pattern, you can recurse the first sub pattern using (?1)

^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\ \d{2}:\d{2})\h ([A-Z] )\h (.*(?:\R(?!(?1)).*)*)

In parts, the pattern matches:

  • ^ Start of string
  • (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\ \d{2}:\d{2}) Capture group 1, match a date and time like pattern
  • \h Match 1 horizontal whitespace chars
  • ([A-Z] ) Capture group 2, match 1 uppercase chars A-Z
  • \h Match 1 horizontal whitespace chars
  • ( Capture group 3
    • .* Match the rest of the ine
    • (?:\R(?!(?1)).*)* Optionally repeat matching a newline and the rest of the line asserting that what is directly to the right from the current position does not match sub pattern 1 (the pattern group 1)
  • ) Close group 3

See a regex demo and a PHP demo.

For example with 2 lines, both starting with the same pattern:

$re = '/^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\ \d{2}:\d{2})\h ([A-Z] )\h (.*(?:\R(?!(?1)).*)*)/m';
$str = '2021-12-08T18:18:38 00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
2021-12-08T18:18:38 00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"';

preg_match_all($re, $str, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    print_r($match);
}

Output

Array
(
    [0] => 2021-12-08T18:18:38 00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
    [1] => 2021-12-08T18:18:38 00:00
    [2] => INFO
    [3] => Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
)
Array
(
    [0] => 2021-12-08T18:18:38 00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
    [1] => 2021-12-08T18:18:38 00:00
    [2] => INFO
    [3] => Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
)

CodePudding user response:

Here's a simple method with explode() and its limit parameter.

list($date, $severity, $message) = explode(' ', $str, 3);

var_dump($date, $severity, $message);
/*
string(25) "2021-12-08T18:18:38 00:00"
string(4) "INFO"
string(170) "Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1 MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.""
*/

As long as the spaces before the message are constant, and none of the parts leading up to it can contain spaces, this will work. If any part before the message has spaces some of the time then this will not work consistently.

  • Related