Home > database >  How to use `re.findall` to extract data from string
How to use `re.findall` to extract data from string

Time:10-16

I have the following string (file):

s = '''
\newcommand{\commandName1}{This is first command}

\newcommand{\commandName2}{This is second command with {} brackets inside
in multiple lines {} {}
}

\newcommand{\commandName3}{This is third, last command}

'''

Now I would like to use Python re package to extract the data to dictionary where key is the command name (\commandName1, \commandName2 and \commandName3) and the values are the This is first command, This is second command with {} brackets inside in multiple lines {} {} and This is third, last command. I tried sth like:

re.findall(r'\\newcommand{(. )}{(. )}', s)

but it doesnt work because second command has {} inside. What is the easiest way to do that?

CodePudding user response:

You may use this regex:

\\newcommand{([^}] )}{(. ?)}(?=\s*\\newcommand|\Z)

RegEx Demo

RegEx Breakdown:

  • \\newcommand:
  • {: Match a {
  • ([^}] ): Match 1 of any characters that are not { in capture group #1
  • }: Match a }
  • {: Match a {
  • (. ?): Match 1 of any characters in capture group #2
  • }: Match a }
  • (?=\s*\\newcommand|\Z): Lookahead to assert presence of 0 or more whitespace and \newcommand or else end of input

CodePudding user response:

Try (regex101):

import re

s = r"""\newcommand{\commandName1}{This is first command}

\newcommand{\commandName2}{This is second command with {} brackets inside
in multiple lines {} {}
}

\newcommand{\commandName3}{This is third, last command}

"""

out = re.findall(
    r"^\\newcommand\{(.*?)\}\{((?:(?!^\\newcommand).) )\}", s, flags=re.S | re.M
)
print(out)

Prints:

[
    ("\\commandName1", "This is first command"),
    (
        "\\commandName2",
        "This is second command with {} brackets inside\nin multiple lines {} {}\n",
    ),
    ("\\commandName3", "This is third, last command"),
]
  • Related