I have something like this:
DoA(DoB(DoC()));
I am trying to write regex that would return the following:
DoA(DoB(DoC()))
DoB(DoC())
DoC()
Number of nested functions is unknown.
I've tried to do something with (?R) but it's not working quiet right:
[a-zA-Z] \((?:[^()]|((?R)))*\)
CodePudding user response:
You can use
(?=\b([a-zA-Z] (\((?:[^()] |(?2))*\))))
See the regex demo. You will need to extract Group 1 values.
Details:
(?=
- start of a positive lookahead (to enable overlapping matches that do not share the same start position)\b
- a word boundary([a-zA-Z] (\((?:[^()] |(?2))*\)))
- Group 1:[a-zA-Z]
- one or more ASCII letters (use\p{L}
to match any Unicode letters)(\((?:[^()] |(?2))*\))
- Group 2:(
, then one or more repetitions of any one or more chars other than(
and)
or Group 2 recursed, and then a)
char
)
- end of the positive lookahead.
CodePudding user response:
You could extract the desired strings or conclude that the parentheses are unbalanced by sequentially matching the regular expression
^[^()]*\((.*)\)[^()]*$
Initially, the original string is matched; after each match the contents of capture group 1 is matched. The contents of the capture group is saved each time it is matched.
If, for example, the string were
"DoA(DoB(DoC(Doc())))"
after the first match the capture group would hold the string
"DoB(DoC(Doc()))"
which would be saved. Upon matching the regular expression to this string capture group 1 would contain
"DoC(Doc())"
Next, matching this string against the regular expression would populate the capture group with
"Doc()"
When this string is matched the capture group would contain an empty string, which would not itself be matched.
We are finished when no further match is made. If the contents of the capture group for the final match contains neither a left nor right parenthesis we conclude the parentheses are balanced. That is the case here where we have sequentially extracted the strings
"DoB(DoC(Doc()))"
"DoC(Doc())"
"Doc()"
""
As a second example, if the string were
"DoA(DoB(DoC(Doc((Dod)))))
the following strings would be extracted sequentially:
"DoB(DoC(Doc((Dod))))"
"DoC(Doc((Dod)))"
"Doc((Dod))"
"(Dod)"
"Dod"
Now consider a string with unbalanced parentheses.
"DoA)DoB((DoC()))"
The regular expression does not match this string. Since the string contains at least one left or right parenthesis, we conclude that the parentheses are not balanced.
Here is a second example of a string with unbalanced parentheses:
"DoA(DoB((DoC(()))"
Capture group 1 would sequentially contain the following strings:
"DoB((DoC(())"
"(DoC(()"
"DoC(("
We conclude that the parentheses are unbalanced because the regular expression does not match "DoC(("
and that string contains at least one left or right parenthesis.
The regular expression can be broken down as follows.
^ # match beginning of string
[^()]* # match >= 0 characters other than '(' and ')'
\( # match '('
(.*) # match >= 0 characters other than line terminators and save
# to capture group 1
\) # match ')'
[^()]* # match >= 0 characters other than '(' and ')'
$ # match end of string