Home > Blockchain >  Regex rule optional matching characters set in a specific point
Regex rule optional matching characters set in a specific point

Time:04-09

I'm working with regex on PRCE2 environment.

In my switch logs I have to capture a text string that I'm capturing as "message" and that is located in a specific position. The focus point is that it is always preceded by a set of characters ending with : but, after them, I can have or not some addictional characters ending with ; and I must be able to skip them.

Let me explain with my current regex and some log samples.

We can say that I have 3 chances:

 1. (s)[18014]:Recorded command information.
 2. (l):User logged out.
 3. (s)[18014]:CID=0x11aa2222;The user succeeded in logging out of XXX.

My current regex is:

\(\w \)\[*\d*\]*\:(?<message>[^\[] ?\.)

that works for case 1 and 2 because:

  • capture the fact that we always have a (, a literal character and a ) with \(\w \)
  • capture, as in case 2, if after that we have a [, a number and a ] with \[*\d*\]*
  • in every case the following characters are : and I capture it with \:
  • The message is captured, and named, with (?<message>[^\[] ?\.) that must avoid the capturing action if, after :, I have a [. The capture stops when when I get a .

My problem is: after the : I can have the case 3; it always begin with CID=<exadecimal expression>; but it is not only limited to this. After it, I can have other expression always ended by ; So we can say that I can have, for case 3, CID=<hex expression><other numeric and literal characters>;. With current regex, of course, the CIDR part is included in the message. I must avoid it; if the CIDR part is present, the message capture must start after the ; that end it.

So, we can summarize that: IF after the : we have no CIDR word, starts capturing; ELSE, avoid capturing until ; and start the job after it.

CodePudding user response:

The following pattern will match the right part of your test strings.
We look for either a : not followed by CID ?!CID or a ;. We then capture what follows.

((:(?!CID))|;)(.*)

see https://regex101.com/r/JRB4Rq/1

CodePudding user response:

You could write the pattern as:

\(\w \)(?:\[\d \])?:(?:CID=[^;]*;)?(?<message>[^.] \.)

Explanation

  • \(\w \) Match 1 word chars between parenthesis
  • (?:\[\d \])? Optionally match 1 digits between square brackets
  • : Match the colon (you don't have to escape it)
  • (?:CID=[^;]*;)? Optionally match the CID= part till the first semicolon
  • (?<message>[^.] \.) Group message, match 1 chars other than . and then match the .

See a regex demo.

  • Related