Home > OS >  Python xml - remove spaces to get aligned xml document
Python xml - remove spaces to get aligned xml document

Time:10-09

I have an MyXml.xml with structure:

<?xml version='1.0' encoding='utf-8'?>
<tag1 atrib1='bla' atrib1='bla' atrib1='bla' atrib1='bla'>
    <tag2 atrib = 'something'>
        <tag3 atrib = 'something'>
           <tag4 atrib = '..'>
           </tag4>
        </tag3>
        <tag5 atrib = 'important'><div><h1>ContentFrom **OldXml.xml** </h1></div>
        ...
        </tag5>
    </tag2>
 </tag1>

Does anyone have idea how to make it to be in this form (to remove all spaces):

<?xml version='1.0' encoding='utf-8'?>
<tag1 atrib1='bla' atrib1='bla' atrib1='bla' atrib1='bla'>
<tag2 atrib = 'something'>
<tag3 atrib = 'something'>
<tag4 atrib = '..'>
<tag5 atrib = 'important'><div><h1>ContentFrom **OldXml.xml** </h1></div>
...

I have tried with this but won't work:

# Read in the file to a DOM data structure.
original_document = minidom.parse("MyXml.xml")

# Open a UTF-8 encoded file, because it's fairly standard for XML.
stripped_file = codecs.open("New_MyXml.xml", "w", encoding="utf8")

# Tell minidom to format the child text nodes without any extra whitespace.
original_document.writexml(stripped_file, indent="", addindent="", newl="")

stripped_file.close()

EDIT:

File is created through FOR loop in which elements are created, at the end this is how writing is done:

    dom = xml.dom.minidom.parseString(ET.tostring(root))
    xml_string = dom.toprettyxml()
    part1, part2 = xml_string.split('?>')
    with open("MyXml.xml", 'w') as xfile:
          xfile.write(part1   'encoding=\"{}\"?>\n'.format(m_encoding)   part2)
          xfile.close()

EDIT newest code that prints whole doc in one line:

    dom = xml.dom.minidom.parseString(ET.tostring(root))
    xml_string = dom.toxml()
    part1, part2 = xml_string.split('?>')
    xmlstring = f'{part1} encoding="{m_encoding}"?>\n {part2}'
    with open("MyXml.xml", 'w') as xfile:
        for line in xmlstring.split("\n"):
          xfile.write(line.strip()   "\n")

CodePudding user response:

If you literally just want to strip whitespace, you don't need (or want) an xml parser at all:

from pathlib import Path

inf = Path("my-input.xml")
with inf.open() as f, inf.with_name(f"stripped-{inf.name}").open("w") as g:
    for line in f:
        g.write(line.strip()   "\n")

Pathlib is just playing the role of os.path, open, etc here: you can rewrite without it if you happen not to like it (but pathlib is so vastly superior to munging text strings for paths I'm sure you wouldn't want to...)

If you do need to load with a parser, use exactly the same trick when it comes to writing, but iterate the parser object linewise.


Demonstration:

from tempfile import TemporaryFile

data = """<?xml version='1.0' encoding='utf-8'?>
<tag1 atrib1='bla' atrib1='bla' atrib1='bla' atrib1='bla'>
    <tag2 atrib = 'something'>
        <tag3 atrib = 'something'>
           <tag4 atrib = '..'>
           </tag4>
        </tag3>
        <tag5 atrib = 'important'><div><h1>ContentFrom **OldXml.xml** </h1></div>
        ...
        </tag5>
    </tag2>
 </tag1>"""

with TemporaryFile(mode="w ") as f, TemporaryFile(mode="w ") as g:
    f.write(data)
    f.seek(0)
    print("Before:")
    for line in f:
        print(line, end="")
        g.write(line.strip()   "\n")

    print("\n\nAfter:")
    g.seek(0)
    for line in g:
        print(line, end="")

Edit:

In your case there is a much simpler solution: just don't use toprettyxml at all, use toxml. (Update: apparently that renders with no linebreaks at all). But even without that we can do the same thing:

xml_string = dom.toprettyxml()
part1, part2 = xml_string.split('?>')
xmlstring = f'{part1} encoding="{m_encoding}"?>\n {part2}'
with open("MyXml.xml", 'w') as xfile:
    for line in xmlstring.split("\n"):
        xfile.write(line.strip()   "\n")

However I suspect toprettyxml(indent="") will do the same thing:

xml_string = dom.toprettyxml(indent="")
...
with open("MyFile.xml", "w") as f:
    f.write(xml_string)
  • Related