I'm formatting the output of an SMTP server log for output on a secured website. I already formatted the IP addresses with and without added port numbers (123.123.123.123 and 123.123.123.123:456, /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\d{1,3}\.\d{1,3}\.\d{1,3}/
).
Now I need to format other numeric values, but not in combination with non-numeric characters like ID's and CRAM-MD5
).
In the following example, I need to get the 100, 56, 0, but neither the 5 of CRAM-MD5
, nor the 21 (or 21E9) of 21E9C126E0B80
, aAnd I need the 0 after Client
and the 2022070508301657009855590.
2022-07-05 12:00:00 New Client Rejected (192.241.222.210 [digitalocean.com] -> AbuseIPDB Score: 100)
2022-07-05 12:00:00 New Client Connected (137.184.30.176 [digitalocean.com] -> AbuseIPDB Score: 56)
2022-07-05 12:00:00 New Client Connected (192.168.10.12 [] -> AbuseIPDB Score: 0)
2022-07-05 12:00:00 250-AUTH LOGIN PLAIN CRAM-MD5
2022-07-05 12:00:00 250 2.0.0 Ok: queued as 21E9C126E0B80
2022-07-05 12:00:00 Client 0 from 192.168.10.12 Disconnecting
2022-07-05 12:00:00 Forward mail 2022070508301657009855590
I currently have the following regex, which gets me 100
only: / [^a-zA-Z\/\.>(]\d [^a-zA-Z\/\.>)\-]/
Yes, I need a space in front and exclude the >
to avoid formatting an already formatted string. And yes, there need to be some follow-up characters excluded.
Here is my code:
preg_match_all('/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\d{1,3}\.\d{1,3}\.\d{1,3}/', $sLog, $matches);
foreach ($matches[0] as $nr) {
$sLog = str_replace($nr, '<span >' . $nr . '</span>', $sLog);
}
The test scenario is here: https://regex101.com/r/sbD10s/1.
The regex will be used inside preg_match_all()
.
Can anyone help me on finding the correct regex?
CodePudding user response:
Since you wrap your matches with other strings, you should use preg_replace
directly.
To match the numbers after whitespaces that are not followed with a dot another digits, you can use (?<=\h)\d \b(?!\.\d)
pattern.
The whole solution for the current problem will look like
$sLog = preg_replace('~\d{1,3}\.\d{1,3}\.\d{1,3}(?:\.\d{1,3}(?::\d{1,5})?)?~', '<span >$0</span>', $sLog);
$sLog = preg_replace('~(?<=\h)\d \b(?!\.\d)~', '<span>$0</span>', $sLog);
Please adjust the replacement pattern in the second preg_replace
to your liking. If the replacements are identical to both, just merge the two patterns into a single one:
$sLog = preg_replace('~\d{1,3}\.\d{1,3}\.\d{1,3}(?:\.\d{1,3}(?::\d{1,5})?)?|(?<=\h)\d \b(?!\.\d)~', '<span >$0</span>', $sLog);
See the (?<=\h)\d \b(?!\.\d)
regex demo:
(?<=\h)
- immediately to the left, there must be a horizontal whitespace\d
- one or more digits\b
- a word boundary(?!\.\d)
- immediately on the right, there must be no.
and a digit.