Home > Software engineering >  Get words in parenthesis as a group regex
Get words in parenthesis as a group regex

Time:10-24

String1: {{word1|word2|word3 (word4 word5)|word6}}

String2: {{word1|word2|word3|word6}}

With this regex sentence:

(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?=\}\})

I capture String2 as groups. How can I change the regex sentence to capture (word4 word5) also as a group?

CodePudding user response:

You can add a (?:\s*(\([^()]*\)))? subpattern:

(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?:\s*(\([^()]*\)))?\|(\w (?:\s \w )*)(?=\}\})

See the regex demo.

The (?:\s*(\([^()]*\)))? part is an optional non-capturing group that matches one or zero occurrences of

  • \s* - zero or more whitespaces
  • ( - start of a capturing group:
    • \( - a ( char
    • [^()]* - zero or more chars other than ( and )
    • \) - a ) char
  • ) - end of the group.

If you need to make sure only whitespace separated words are allowed inside parentheses, replace [^()]* with \w (?:\s \w )* and insert (?:\s*(\(\w (?:\s \w )*\)))?:

(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?:\s*(\(\w (?:\s \w )*\)))?\|(\w (?:\s \w )*)(?=\}\})

See this regex demo.

CodePudding user response:

You could simplify the expression by matching the desired substrings rather than capturing them. For that you could use the following regular expression.

(?<=[{| ])\w (?=[}| ])|\([\w ] \)

Regex demo <¯\(ツ)> Python demo

The elements of the expression are as follows.

(?<=     # begin a positive lookbehind
  [{| ]  # match one of the indicated characters
)        # end the positive lookbehind
\w       # match one or more word characters
(?=      # begin a positive lookahead
  [}| ]  # match one of the indicated characters
)        # end positive lookahead
|        # or
\(       # match character
[\w ]    # match one or more of the indicated characters 
\)       # match character

Note that this does not validate the format of the string.

  • Related