Home > OS >  How to conditionally expect particular characters if a prior regex matched?
How to conditionally expect particular characters if a prior regex matched?

Time:12-21

I want to expect some characters only if a prior regex matched. If not, no characters (empty string) is expected. For instance, if after the first four characters appears a string out of the group (A10, B32, C56, D65) (kind of enumeration) then a "_" followed by a 3-digit number like 123 is expected. If no element of the mentioned group appears, no other string is expected. My first attempt was this but the ELSE branch does not work:

^XXX_(?<DT>A12|B43|D14)(?(DT)(_\d{1,3})|)\.ZZZ$
  • XXX_A12_123.ZZZ --> match
  • XXX_A11.ZZZ --> match
  • XXX_A12_abc.ZZZ --> no match
  • XXX_A23_123.ZZZ --> no match

These are examples of filenames. If the filename contains a string of the mentioned group like A12 or C56, then I expect that this element if followed by an underscore followed by 1 to 3 digits. If the filename does not contain a string of that group (no character or a character sequence different from the strings in the group) then I don't want to see the underscore followed by 1 to 3 digits.

For instance, I could extend the regex to

^XXX_(?<DT>A12|B43|D14)_\d{5}(?(DT)(_\d{1,3})|)_someMoreChars\.ZZZ$

...and then I want these filenames to be valid:

  • XXX_A12_12345_123_wellDone.ZZZ
  • XXX_Q21_00000_wellDone.ZZZ
  • XXX_Q21_00000_456_wellDone.ZZZ

...but this is invalid:

  • XXX_A12_12345_wellDone.ZZZ

How can I make the ELSE branch of the conditional statement work?

In the end I intend to have two groups like Group A: (A11, B32, D76, R33) Group B: (A23, C56, H78, T99)

If an element of group A occurs in the filename then I expect to find _\d{1,3} in the filename. If an element of group B occurs ion the filename then the _\d{1,3} shall be optional (it may or may not occur in the filename).

I ended up in this regex:

^XXX_(?:(?A12|B43|D14))?(?(DT)(_\d{5}_\d{1,3})|(?!(?&DT))(?!.*_\d{3}(?!\d))).*\.ZZZ$
^XXX_(?:(?<DT>A12|B43|D14))?_\d{5}(?(DT)(_\d{1,3})|(?!(?&DT))(?!.*_\d{3}(?!\d))). \.ZZZ$

Since I have to use this regex in the OpenApi @Pattern annotation I have the problem that I get the error:

Conditionals are not supported in this regex dialect.

As @The fourth bird suggested alternation seems to do the trick:

XXX_((((A12|B43|D14)_\d{5}_\d{1,3}))|((?:(A10|B10|C20)((?:_\d{5}_\d{3})|(?:_\d{3}))))).*\.ZZZ$

CodePudding user response:

The else branch is the part after the |, but if you also want to match the 2nd example, the if clause would not work as you have already matched one of A12|B43|D14

The named capture group is not optional, so the if clause will always be true.

What you can do instead is use an alternation to match either the numeration part followed by an underscore and 3 digits, or match an uppercase char and 2 digits.

^XXX_(?:(?<DT>A12|B43|D14)_\d{1,3}|[A-Z]\d{2})\.ZZZ$

Regex demo

If you want to make use of the if/else clause, you can make the named capture group optional, and then check if group 1 exists.

^XXX_(?<DT>A12|B43|D14)?(?(DT)_\d{1,3}|[A-Z]\d{2})\.ZZZ$

Regex demo

For the updated question:

^XXX_(?<DT>A12|B43|D14)?(?(DT)(?:_\d{5})?_\d{3}(?!\d)|(?!A12|B43|D14|[A-Z]\d{2}_\d{3}(?!\d))).*\.ZZZ$

The pattern matches:

  • ^ Start of string
  • XXX_ Match literally
  • (?<DT>A12|B43|D14)?
  • (?(DT) If we have group DT
    • (?:_\d{5})? Optionally match _ and 5 digits
    • _\d{3}(?!\d) Match _ and 3 digits
    • | Or
    • (?! Negative lookahead, assert not to the right
      • A12|B43|D14| Match one of the alternatives, or
      • [A-Z]\d{2}_\d{3}(?!\d) Match 1 char A-Z, 2 digits _ 3 digits not followed by a digit
    • ) Close lookahead
  • ) Close if clause
  • .* Match the rest of the line
  • \.ZZZ Match . and ZZZ
  • $ End of string

Regex demo

  • Related