Home > front end >  Regular expression to match text between specific keywords
Regular expression to match text between specific keywords

Time:04-26

I'm looking for a regular expression that matches the text between the keywords .if .else .elseif and .endif

Example:

.if CONDITION1
code1
.elseif CONDITION2
code2
.else
code3
.endif

Ideally, the regular expression would just match code1, code2 and code3, but it is ok if it matches the text after the keywords as well (CONDITION1, CONDITION2...).

I've tried the following regex:

(?:\.if|\.else)(.*?)(?:\.else|\.endif)

but it misses code2

Demo

The regular expression has to work as well when there is no .elseif and/or .else.

CodePudding user response:

Your issue is that you are effectively trying to capture an overlapping match from the if to the elseif clause. You can workaround that by making the final group in your regex a lookahead instead.

(?:\.if|\.else(?:if)?)(.*?)(?=\.else|\.endif)

Demo on regex101

Note in terms of avoiding capturing the conditions (or simplifying post-processing them out), you would need to change the first group of the regex to allow for an optional if after the .else. I've made that change above, but it's not strictly necessary to make the regex work as yours currently does.

If the condition only occurs on the same line as the .if/.elseif, you can use @CarySwoveland's idea (see comments) to avoid capturing the condition:

(?:\.if|\.else(?:if)?).*\r?\n([\s\S]*?)(?=\.else|\.endif)

This requires removing the DOTALL flag. See the demo.

CodePudding user response:

You could match the regular expression

^(?![ \t]*\.(?:if|elseif|else|endif)\b).*

with the multiline flag set (causing ^ and $ to respectively match the beginning of each line, rather than the beginning and end of the string).

This matches all lines that do not begin with zero or more spaces or tabs followed by '.if', '.elseif', '.else' or '.endif'. If the string is that shown below the lines containing 'code' and the empty line are matched.

.if CONDITION1
code1  
.elseif CONDITION2
   code2
.code3

     .code4
  .elseif CONDITION3
     code5
  .else
     code6
.endif

Demo

The regular expression can be broken down as follows.

^                           # match beginning of line 
(?!                         # begin a negative lookahead
  [ \t]*                    # match zero or more tabs or spaces  
  \.                        # match a period
  (?:if|elseif|else|endif)  # match one of the strings in the non-capture group
  \b                        # match a word boundary
)                           # end negative lookahead
.*                          # match the line
  • Related