I need to convert an XML document to LaTeX. Something like
<?xml version="1.0" encoding="UTF-8"?>
<foo>
12345
<bar>
67890
</bar>
</foo>
to
\foo{12345\bar{67890}}
If I do
from bs4 import BeautifulSoup
with open("foobar.xml") as fp:
soup = BeautifulSoup(fp, 'xml')
for tag in soup.find_all("foo"):
tag.replaceWith(f"""{tag.text}""")
then it removes the tags inside. If I use contents
instead of text
, it returns a list.
I have tried soup.find('foo').unwrap()
, but it just removes the tag, without replacing it.
I know I can first replace bar
and then foo
, but I would like a solution that doesn't depend on the order of the tags.
CodePudding user response:
Try using recursion:
from bs4 import BeautifulSoup, NavigableString, Tag
xml_doc = """\
<?xml version="1.0" encoding="UTF-8"?>
<foo>
12345
<bar>
67890
</bar>
</foo>"""
def write(tag):
s = "\\" tag.name "{"
for c in tag.contents:
if isinstance(c, Tag):
s = write(c)
elif isinstance(c, NavigableString):
s = c.strip()
return s "}"
soup = BeautifulSoup(xml_doc, "xml")
print(write(soup.foo))
Prints:
\foo{12345\bar{67890}}