Home > Enterprise >  Consolidated RegEx to parse syslog data
Consolidated RegEx to parse syslog data

Time:11-07

Goal

I am trying to craft a RegEx that will parse out specific data from various syslog entries that contain subtle differences in logged content. While I am able to accomplish my goal using multiple RegEx statements, if possible, I would like to combine these statements into a single consolidated RegEx.


Log entries

The main issue I'm having is that some log entries have a URL that needs to be parsed to a named group and other log entries do not have any URL. Examples of these two different log entries are provided below.

Entry with URL

Nov 3 11:33:04 host1 postfix/smtpd[12812]: NOQUEUE: reject: RCPT from 178.red-83-59-180.dynamicip.rima-tde.net[83.59.180.178]: 554 5.7.1 Service unavailable; Client host [83.59.180.178] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=83.59.180.178; [email protected] [email protected] proto=ESMTP helo=<178.red-83-59-180.dynamicip.rima-tde.net>

Entry without URL

Nov 2 16:01:25 host1 postfix/smtpd[31667]: NOQUEUE: reject_warning: RCPT from mail1.sendersrv.com[185.3.229.125]: 554 5.7.1 Service unavailable; Client host [185.3.229.125] blocked using bl.spamcop.net; from=bounces [email protected] [email protected] proto=ESMTP helo=<mail1.sendersrv.com>


RegEx statements

In the RegEx statements that follow, the first two are what I currently use for each of the previous log messages. The third RegEx is my attempt at consolidating these both into a single RegEx that will parse data from either log message. My attempt was to use a conditional statement that would basically check for the existence of http(s) and if found, then to parse the URL to a named group. If http(s) was not found, then it would parse out everything until the next RegEx token.

The issue is that when I test the RegEx against a log entry that has a URL, the RegEx does not seem to find http(s) despite this token being set as optional (i.e. using the ? quantifier). However, if I remove the ? quantifier, it does find http(s) and then parses the URL as desired. However, without the quantifier, the RegEx does not work with log entries that do not have a URL.

Parse entries with URL

^(?P<datetime>. ) host1 postfix. RCPT from (?P<srcDns>. )\[(?P<srcIp>[0-9\.] )\]:. blocked using (?P<blkList>. );. https?:\/{2}(?P<entryUrl>. );\s. \sto=\<(?P<destEm>. )>. $

Parse entries without URL

^(?P<datetime>. ) host1 postfix. RCPT from (?P<srcDns>. )\[(?P<srcIp>[0-9\.] )\]:. blocked using (?P<blkList>. );\s. \sto=\<(?P<destEm>. )>. $

Attempt at consolidating RegEx

^(?P<datetime>. ) host1 postfix. RCPT from (?P<srcDns>. )\[(?P<srcIp>[0-9\.] )\]:. blocked using (?P<blkList>. )(?<=[a-z]);. (https?:\/{2})?(?(5)(?P<entryUrl>. )|. )to=\<(?P<destEm>. )>. $

I'm sure the issue is my misunderstanding as to how the conditional statements and the ? quantifier works. All suggestions are welcome and thanks in advance for your time.

CodePudding user response:

Have you tried to test your regex on page like regex101?
to=\<(?P<destEm>. )> doesn't seem to match your examples. You should either remove <> or replace to with helo. Be careful to make your quantifier lazy after blkList otherwise you might catch too much text.
You can then make your url optional with ? and it should work in both cases:

^(?P<datetime>. ) host1 postfix. RCPT from (?P<srcDns>. )\[(?P<srcIp>[0-9\.] )\]:. blocked using (?P<blkList>. ?);(. https?:\/{2}(?P<entryUrl>. );\s)?. \sto=(?P<destEm>. ?)\s.*$

CodePudding user response:

One approach would be to replace in the first regex . https?:\/{2}(?P<entryUrl>. ); with (?:. https?:\/{2}(?P<entryUrl>. );)? where ?: indicates that it is a non-capturing group and the ? at the end means that it is optional.

However, it still does not work because . is greedy, so use lazy . ? instead.

Final regex:

^(?P<datetime>. ?) host1 postfix. ?RCPT from (?P<srcDns>. ?)\[(?P<srcIp>[0-9\.] )\]:. ?blocked using (?P<blkList>. ?);(?:. ?https?:\/{2}(?P<entryUrl>. ?);)?\s. ?\sto=\<(?P<destEm>. ?)>. ?$

https://regex101.com/r/QkmXWz (to see it in action)

  • Related