Home > Blockchain >  Get only numeric values from a string
Get only numeric values from a string

Time:07-06

I'm formatting the output of an SMTP server log for output on a secured website. I already formatted the IP addresses with and without added port numbers (123.123.123.123 and 123.123.123.123:456, /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\d{1,3}\.\d{1,3}\.\d{1,3}/).

Now I need to format other numeric values, but not in combination with non-numeric characters like ID's and CRAM-MD5).

In the following example, I need to get the 100, 56, 0, but neither the 5 of CRAM-MD5, nor the 21 (or 21E9) of 21E9C126E0B80, aAnd I need the 0 after Client and the 2022070508301657009855590.

2022-07-05 12:00:00 New Client Rejected (192.241.222.210 [digitalocean.com] -> AbuseIPDB Score: 100)
2022-07-05 12:00:00 New Client Connected (137.184.30.176 [digitalocean.com] -> AbuseIPDB Score: 56)
2022-07-05 12:00:00 New Client Connected (192.168.10.12 [] -> AbuseIPDB Score: 0)
2022-07-05 12:00:00 250-AUTH LOGIN PLAIN CRAM-MD5
2022-07-05 12:00:00 250 2.0.0 Ok: queued as 21E9C126E0B80
2022-07-05 12:00:00 Client 0 from 192.168.10.12 Disconnecting
2022-07-05 12:00:00 Forward mail 2022070508301657009855590

I currently have the following regex, which gets me 100 only: / [^a-zA-Z\/\.>(]\d [^a-zA-Z\/\.>)\-]/
Yes, I need a space in front and exclude the > to avoid formatting an already formatted string. And yes, there need to be some follow-up characters excluded.

Here is my code:

preg_match_all('/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\d{1,3}\.\d{1,3}\.\d{1,3}/', $sLog, $matches);
foreach ($matches[0] as $nr) {
$sLog = str_replace($nr, '<span >' . $nr . '</span>', $sLog);
}

The test scenario is here: https://regex101.com/r/sbD10s/1.
The regex will be used inside preg_match_all().

Can anyone help me on finding the correct regex?

CodePudding user response:

Since you wrap your matches with other strings, you should use preg_replace directly.

To match the numbers after whitespaces that are not followed with a dot another digits, you can use (?<=\h)\d \b(?!\.\d) pattern.

The whole solution for the current problem will look like

$sLog = preg_replace('~\d{1,3}\.\d{1,3}\.\d{1,3}(?:\.\d{1,3}(?::\d{1,5})?)?~', '<span >$0</span>', $sLog);
$sLog = preg_replace('~(?<=\h)\d \b(?!\.\d)~', '<span>$0</span>', $sLog);

Please adjust the replacement pattern in the second preg_replace to your liking. If the replacements are identical to both, just merge the two patterns into a single one:

$sLog = preg_replace('~\d{1,3}\.\d{1,3}\.\d{1,3}(?:\.\d{1,3}(?::\d{1,5})?)?|(?<=\h)\d \b(?!\.\d)~', '<span >$0</span>', $sLog);

See the (?<=\h)\d \b(?!\.\d) regex demo:

  • (?<=\h) - immediately to the left, there must be a horizontal whitespace
  • \d - one or more digits
  • \b - a word boundary
  • (?!\.\d) - immediately on the right, there must be no . and a digit.
  • Related