I'm looking for a regular expression that matches the text between the keywords .if .else .elseif and .endif
Example:
.if CONDITION1
code1
.elseif CONDITION2
code2
.else
code3
.endif
Ideally, the regular expression would just match code1, code2 and code3, but it is ok if it matches the text after the keywords as well (CONDITION1, CONDITION2...).
I've tried the following regex:
(?:\.if|\.else)(.*?)(?:\.else|\.endif)
but it misses code2
The regular expression has to work as well when there is no .elseif and/or .else.
CodePudding user response:
Your issue is that you are effectively trying to capture an overlapping match from the if
to the elseif
clause. You can workaround that by making the final group in your regex a lookahead instead.
(?:\.if|\.else(?:if)?)(.*?)(?=\.else|\.endif)
Note in terms of avoiding capturing the conditions (or simplifying post-processing them out), you would need to change the first group of the regex to allow for an optional if
after the .else
. I've made that change above, but it's not strictly necessary to make the regex work as yours currently does.
If the condition only occurs on the same line as the .if
/.elseif
, you can use @CarySwoveland's idea (see comments) to avoid capturing the condition:
(?:\.if|\.else(?:if)?).*\r?\n([\s\S]*?)(?=\.else|\.endif)
This requires removing the DOTALL
flag. See the demo.
CodePudding user response:
You could match the regular expression
^(?![ \t]*\.(?:if|elseif|else|endif)\b).*
with the multiline flag set (causing ^
and $
to respectively match the beginning of each line, rather than the beginning and end of the string).
This matches all lines that do not begin with zero or more spaces or tabs followed by '.if'
, '.elseif'
, '.else'
or '.endif'
. If the string is that shown below the lines containing 'code'
and the empty line are matched.
.if CONDITION1
code1
.elseif CONDITION2
code2
.code3
.code4
.elseif CONDITION3
code5
.else
code6
.endif
The regular expression can be broken down as follows.
^ # match beginning of line
(?! # begin a negative lookahead
[ \t]* # match zero or more tabs or spaces
\. # match a period
(?:if|elseif|else|endif) # match one of the strings in the non-capture group
\b # match a word boundary
) # end negative lookahead
.* # match the line