I have the following string (file):
s = '''
\newcommand{\commandName1}{This is first command}
\newcommand{\commandName2}{This is second command with {} brackets inside
in multiple lines {} {}
}
\newcommand{\commandName3}{This is third, last command}
'''
Now I would like to use Python re
package to extract the data to dictionary where key
is the command name (\commandName1
, \commandName2
and \commandName3
) and the values are the This is first command
, This is second command with {} brackets inside in multiple lines {} {}
and This is third, last command
. I tried sth like:
re.findall(r'\\newcommand{(. )}{(. )}', s)
but it doesnt work because second command has {}
inside. What is the easiest way to do that?
CodePudding user response:
You may use this regex:
\\newcommand{([^}] )}{(. ?)}(?=\s*\\newcommand|\Z)
RegEx Breakdown:
\\newcommand
:{
: Match a{
([^}] )
: Match 1 of any characters that are not{
in capture group #1}
: Match a}
{
: Match a{
(. ?)
: Match 1 of any characters in capture group #2}
: Match a}
(?=\s*\\newcommand|\Z)
: Lookahead to assert presence of 0 or more whitespace and\newcommand
or else end of input
CodePudding user response:
Try (regex101):
import re
s = r"""\newcommand{\commandName1}{This is first command}
\newcommand{\commandName2}{This is second command with {} brackets inside
in multiple lines {} {}
}
\newcommand{\commandName3}{This is third, last command}
"""
out = re.findall(
r"^\\newcommand\{(.*?)\}\{((?:(?!^\\newcommand).) )\}", s, flags=re.S | re.M
)
print(out)
Prints:
[
("\\commandName1", "This is first command"),
(
"\\commandName2",
"This is second command with {} brackets inside\nin multiple lines {} {}\n",
),
("\\commandName3", "This is third, last command"),
]