I have the following string
tex = r'''
...
\begin{tabular}
abcdefg
\endhead
\endlastfoot
...
'''
and I want to extract the code between \begin{tabular}
and \endlastfoot
or \endhead
if \endlastfoot
doesnt exists. the following doesnt work as I want:
res = re.search(r"\\begin{tabular}.*(?:\\endhead)?(?:\\endlastfoot)?", tex, re.DOTALL)
What should I change to avoid multiple if statements and re.search
?
CodePudding user response:
/((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))|((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))/gm
There are 2 groups:
((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endlastfoot))
(?<=\\begin{tabular})
is a positive lookbehind. Searches everything after the string \begin{tabular}(?=\\endlastfoot)
is a positive lookahead. Searches everything before \endlastfoot[\n\w\s\\]*
matches a code block between the search strings\n
for multilines,\w
for words (a-zA-Z0-9_
),\s
for spaces,\
a slash.*
matches this set from 0 to multiple times.
((?<=\\begin{tabular})[\n\w\s\\]*(?=\\endhead))
- works as the first group. The only difference - this regex group searches between the strings \begin{tabular} and \endhead