Home > Mobile >  Substituting XML tags with LaTeX commands using Beautiful Soup
Substituting XML tags with LaTeX commands using Beautiful Soup

Time:06-25

I need to convert an XML document to LaTeX. Something like

<?xml version="1.0" encoding="UTF-8"?>

<foo>
    12345
    <bar>
        67890
    </bar>
</foo>

to

\foo{12345\bar{67890}}

If I do

from bs4 import BeautifulSoup

with open("foobar.xml") as fp:
    soup = BeautifulSoup(fp, 'xml')

for tag in soup.find_all("foo"):
    tag.replaceWith(f"""{tag.text}""")

then it removes the tags inside. If I use contents instead of text, it returns a list.

I have tried soup.find('foo').unwrap(), but it just removes the tag, without replacing it.

I know I can first replace bar and then foo, but I would like a solution that doesn't depend on the order of the tags.

CodePudding user response:

Try using recursion:

from bs4 import BeautifulSoup, NavigableString, Tag


xml_doc = """\
<?xml version="1.0" encoding="UTF-8"?>
<foo>
    12345
    <bar>
        67890
    </bar>
</foo>"""


def write(tag):
    s = "\\"   tag.name   "{"
    for c in tag.contents:
        if isinstance(c, Tag):
            s  = write(c)
        elif isinstance(c, NavigableString):
            s  = c.strip()
    return s   "}"


soup = BeautifulSoup(xml_doc, "xml")
print(write(soup.foo))

Prints:

\foo{12345\bar{67890}}
  • Related