Home > database >  Regex to only match group when inside of certain characters
Regex to only match group when inside of certain characters

Time:11-15

I have this regex {=\((COMMENT|TYPE)\):(.\X ?)} that matches certain strings but I only want it to do that match when the string it finds is inside a certain string. So for example I have this https://i.imgur.com/TgPr407.png input where it does what I want, but I only want it to match them when they are in between {=(SETTINGS): and the ending } so it would not should not match the ones underlined in red in this screenshot https://i.imgur.com/14X2rPX.png

CodePudding user response:

The way I would approach this is with two regular expressions and requires no special capabilities and should work with most languages. The first regular expression extracts the contents between the opening {=(SETTINGS): and closing }. This is only possible if we assume that between these opening and closing sequence of characters we can rely on that the { and } occur as they do, i.e. only singly nested within. Then the contents between these opening and closing sequences are 0 or more sequences of:

  1. 1 or more characters other than { or }
  2. A { followed by 0 or more characters other than { or } followed by a }

Once the contents has been extracted a second regex can then do the rest of the work. The following is coded in Python (you did not specify a language, so I am free to choose). I am specifying (?x) at the start of the regular expressions. This turns on verbose mode, which allows whitespace to be ignored allowing my regex to span across multiple lines and to have comments so as to be self-commenting.

Next time please do not use images. Instead copy and paste the actual text into your question so that I could have easily created the input string. I have therefore used my own, shorter input.

First regex:

{=\(SETTINGS\):((?:[^{}] |(?:{[^{}}]*}))*)}

First regex demo

Second regex:

{=\((?:COMMENT|TYPE)\):[^}]*}

Second regex demo

import re

s = """{=(SETTINGS):
{=(prefix):!}
{=(COMMENT):This is a comment}
{=(prefix):!}
{=(TYPE):number}
{=(prefix):!}
}
{=(COMMENT):This is to be ignored}
"""

pattern = r'''(?x)
{=\(SETTINGS\):       # {=(SETTINGS):
(                     # start of capture group 1
  (?:                 # start of non-capturing group
      [^{}]           # one or more non {} characters
    |                 # or
      (?:{[^{}}]*})   # balanced {} expression
  )                   # end of non capturing group
  *                   # 0 or more times
)                     # end of capture group 1
}                     # }
'''

m = re.search(pattern, s)
if m:
    s2 = m[1]
    pattern2 = r'''(?x)
{=\((?:COMMENT|TYPE)\):  # {=(COMMENT): or {=(TYPE):
[^}]*                    # 0 or more non-} characters
}                        # }
'''
    matches = re.findall(pattern2, s2)
    print(matches)

Prints:

['{=(COMMENT):This is a comment}', '{=(TYPE):number}']
  • Related