Python regex - optional groups-CodePudding

I have the following string

tex = r'''
...

\begin{tabular} 
abcdefg

\endhead

\endlastfoot

...
'''

and I want to extract the code between \begin{tabular} and \endlastfoot or \endhead if \endlastfoot doesnt exists. the following doesnt work as I want:

res = re.search(r"\\begin{tabular}.*(?:\\endhead)?(?:\\endlastfoot)?", tex, re.DOTALL)

What should I change to avoid multiple if statements and re.search?

CodePudding user response：

/((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))|((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))/gm

There are 2 groups:

((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))
- (?<=\\begin{tabular}) is a positive lookbehind. Searches everything after the string \begin{tabular}
- (?=\\endlastfoot) is a positive lookahead. Searches everything before \endlastfoot
- [\n\w\s\\]* matches a code block between the search strings
  - \n for multilines, \w for words (a-zA-Z0-9_), \s for spaces, \ a slash. * matches this set from 0 to multiple times.
((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))
- works as the first group. The only difference - this regex group searches between the strings \begin{tabular} and \endhead

regex101.com