Home > Enterprise >  Is there a "better" way to do conditional lookaround?
Is there a "better" way to do conditional lookaround?

Time:03-27

(?=(pattern1|pattern2)) (pattern1)? (pattern2)? pattern3|pattern4

This is what I have ended up with, and it works, but is there a more parsimonious method? I want to find all strings that contain "box" and have one or more prefixes: pack and/or length.

Marlboro 100's Box
Marlboro Gold Pack 100's Box
Marlboro Special Blend (Gold Pack) 100's Box
Marlboro Silver Pack Box
Marlboro Special Blend (Red Pack) 100s Box
Pall Mall RED 100 BOX
Marlboro Special Blend (Gold Pack) Box

(?i)(?=(((\()?(red|gold|silver|king)( pack)?(\))?)|((70|83|84|100|120)(s|'s)?)))((((\()?(red|gold|silver|king)( pack)?(\))? )?((70|83|84|100|120)(s|'s)? )?)(\bbox\b))
Just trying to read this causes my brain to bleed. I can split the patterns up in code and reuse them easily enough, but ... am I missing something??

pattern1 = ((\()?(red|gold|silver|king)( pack)?(\))?)
pattern2 = ((70|83|84|100|120)(s|'s)?) 
(?=(pattern1|pattern2)) (pattern1)? (pattern2)? pattern3

I have more complex patterns, so this method will continue to work, but am I missing something or are there new regex methods??

regex101 example

CodePudding user response:

Behold:

(?:gold|red|silver|king|pack|[() ]) (?:\d )?(?:'|s| ) ?box$

Note: this has a side effect of capturing one leading space.

https://regex101.com/r/QKEX6f/1

CodePudding user response:

You might write the pattern without a lookahead, and use a single conditional for matching a closing parenthesis only when there is an opening parenthesis.

Matching the number 83 and 84 could be shortened using a character class 8[34] and s|'s can be shortened to '?s

(?i)(?<!\S)(?:(\()?\b(?:red|gold|silver|king) pack(?(1)\))(?: (?:70|8[34]|1[02]0)'?s)?|(?:\b(?:red|gold|silver|king) )?\b(?:70|8[34]|1[02]0) ?(?:'s)?) box\b

The pattern matches:

  • (?i) Inline modifier for a case insensitive match
  • (?<!\S) Assert a whitespace boundary to the left
  • (?: Non capture group
    • (\()? Optional capture group 1, match (
    • \b(?:red|gold|silver|king) pack A word boundary, match any of the alternatives and pack
    • (?(1)\)) Conditional, match ) if capture group 1 exists
    • (?: (?:70|8[34]|1[02]0)'?s)? Optionally match a space, any of the alternatives, optional ' and s
    • | Or
    • (?:\b(?:red|gold|silver|king) )? Optionally match a word boundary and any of the alternatives followed by a space
    • \b(?:70|8[34]|1[02]0) ?(?:'s)? A word boundary, match any of the numbers, optional space and optionally match 's
  • ) Close non capture group
  • box\b Match box followed by a word boundary

See a regex demo.

  • Related