Home > Enterprise >  Python regex - optional groups
Python regex - optional groups

Time:01-05

I have the following string

tex = r'''
...

\begin{tabular} 
abcdefg

\endhead

\endlastfoot

...
'''

and I want to extract the code between \begin{tabular} and \endlastfoot or \endhead if \endlastfoot doesnt exists. the following doesnt work as I want:

res = re.search(r"\\begin{tabular}.*(?:\\endhead)?(?:\\endlastfoot)?", tex, re.DOTALL)

What should I change to avoid multiple if statements and re.search?

CodePudding user response:

/((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))|((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))/gm

There are 2 groups:

  • ((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))
    • (?<=\\begin{tabular}) is a positive lookbehind. Searches everything after the string \begin{tabular}
    • (?=\\endlastfoot) is a positive lookahead. Searches everything before \endlastfoot
    • [\n\w\s\\]* matches a code block between the search strings
      • \n for multilines, \w for words (a-zA-Z0-9_), \s for spaces, \ a slash. * matches this set from 0 to multiple times.
  • ((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))
    • works as the first group. The only difference - this regex group searches between the strings \begin{tabular} and \endhead

regex101.com

  • Related