Home > Enterprise >  How can I write a regex expression to capture characters up to a certain character
How can I write a regex expression to capture characters up to a certain character

Time:08-16

I am trying to write a PCRE regex to pull specific information from syslog. I am needing a portion of the log and do not care about anything that comes after it. The problem I am facing is that the character I am trying to cause the "no match" is still showing. Here is the full log:

Aug 15 20:41:30 10.240.8.160 42286: servername Aug 16 2022 01:41:28.245  0000: %ICM_Router_CallRouter-3-1050042: %[comp=Router-*][pname=rtr][iid=prod][mid=1050042][sev=error]: **No default label available for dialed number: SM01.GGB.ACCT.BILLING.5555550778** (ID: 44043).

The part I am needing is No default label available for dialed number: SM01.GGB.ACCT.BILLING.5555550778

The closest I have gotten is by using \bNo. [\(] which matches No default label available for dialed number: SM01.GGB.ACCT.BILLING.5555550778 (. I have also tried using ^\s with no success. When I anchor the parentheses \bNo. [^\(] the following is still matched:

No default label available for dialed number: SM01.GGB.ACCT.BILLING.5555550778 (ID: 44043).

Can someone let me know what I am missing?

CodePudding user response:

The the portion always ends on a dot followed by digits and you don't want to match ( in between:

\bNo\b[^(] \.\d 

Explanation

  • \bNo\b Match the word No between word boundaries
  • [^(] Match 1 chars other than (
  • \.\d Match a dot and 1 digits

Regex demo

Or taking the ** into account:

\*\*\KNo\b[^(] (?=\*\*\s*\()

Explanation

  • \*\* Match **
  • \K Clear the current match buffer (forget what is matched until now)
  • No\b Match the word No
  • [^(] Match 1 chars other than (
  • (?= Positive lookahead
    • \*\*\s*\( Match ** followed by optional spaces and (
  • ) Close the lookahead

Regex demo

CodePudding user response:

With your shown samples, please try following regex. Here is the Online demo for used regex.

\bNo\b.*?\sdialed number:.*?\bACCT\.BILLING\.\d 

Explanation:

\bNo\b         ##Matching string/word No with word boundaries.
.*?\s          ##using lazy match matching till space here.
dialed number: ##Matching dialed number: here.
.*?\bACCT      ##using lazy match followed by word boundaries followed by ACCT.
\.BILLING      ##Matching literal dot followed by BILLING.
\.\d           ##Matching literal dot followed by 1 or more occurrences by digits.

CodePudding user response:

How about using a positive lookhead and optional characters like this?

\bNo. ?(?=\**\s?\()

CodePudding user response:

Well, there must be a certain rule which denotes the text you want to match.

I think the rule is something like:

The text "No default label available for dialed number: ", followed by some alphanumeric, dot-separated identification code.

An associated regex would then be:

No default label available for dialed number: [0-9A-Za-z.] 

Note that the [0-9A-Za-z.] part is simplified, as this matches dots at the beginning and end of the identification code, as well as multiple consecutive dots, which may be undesired. Also note that you may as well loosen the identification code regex as you like. For example, \S matches all non-whitespace characters, and if you assume that the identification code part is always followed by a whitespace, the pattern \S works just fine.

  • Related