String1: {{word1|word2|word3 (word4 word5)|word6}}
String2: {{word1|word2|word3|word6}}
With this regex sentence:
(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?=\}\})
I capture String2 as groups. How can I change the regex sentence to capture (word4 word5)
also as a group?
CodePudding user response:
You can add a (?:\s*(\([^()]*\)))?
subpattern:
(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?:\s*(\([^()]*\)))?\|(\w (?:\s \w )*)(?=\}\})
See the regex demo.
The (?:\s*(\([^()]*\)))?
part is an optional non-capturing group that matches one or zero occurrences of
\s*
- zero or more whitespaces(
- start of a capturing group:\(
- a(
char[^()]*
- zero or more chars other than(
and)
\)
- a)
char
)
- end of the group.
If you need to make sure only whitespace separated words are allowed inside parentheses, replace [^()]*
with \w (?:\s \w )*
and insert (?:\s*(\(\w (?:\s \w )*\)))?
:
(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?:\s*(\(\w (?:\s \w )*\)))?\|(\w (?:\s \w )*)(?=\}\})
See this regex demo.
CodePudding user response:
You could simplify the expression by matching the desired substrings rather than capturing them. For that you could use the following regular expression.
(?<=[{| ])\w (?=[}| ])|\([\w ] \)
Regex demo <¯\(ツ)/¯> Python demo
The elements of the expression are as follows.
(?<= # begin a positive lookbehind
[{| ] # match one of the indicated characters
) # end the positive lookbehind
\w # match one or more word characters
(?= # begin a positive lookahead
[}| ] # match one of the indicated characters
) # end positive lookahead
| # or
\( # match character
[\w ] # match one or more of the indicated characters
\) # match character
Note that this does not validate the format of the string.