Home > Software design >  Regex: quantifiers on capture groups- stumped
Regex: quantifiers on capture groups- stumped

Time:08-25

I am stumped by what seems like a simple regular expression problem: I am trying to parse some text - the general format is

<name> (NOTE <notetext>) (ICON <icontype>)

where both the NOTE and ICON parts are optional

For example

  • Factory NOTE looks like a refinery ICON attraction
  • Strange mountain ICON camera
  • Just a name
  • Tower NOTE probably a cell tower

My attempt at this was

^(. )(NOTE (. ))?(ICON (. ))?

first capture group (ie the part) is easy: just any char from beginning of line for the name

Then I have a second - optional (hence the "?" after) - capture group for the notetext and a

third - optional, hence second "?" - capture group for the icontype

I think I am missing something simple, obvious. But I can't see it !

CodePudding user response:

You should use non-greedy match so that first group only captures minimum required else greedy match will consume all and as later groups are optional they will never match.

Also you need to add $ at the end so whole text matches and your first group captures your intended matches.

Check this demo

CodePudding user response:

Thanks to atnbueno on the IOS Shortcuts discord who provided this correction (and Pushpesh Kumar Rajwanshi who pointed in the same direction)

^(. ?)(?: NOTE (. ?))?(?: ICON (. ))?$

(ie it needed lazy quantifiers because the default is greedy matching - see https://javascript.info/regexp-greedy-and-lazy)

  • Related