Home > other >  How to remove the newline after xml declaration by Python ElementTree?
How to remove the newline after xml declaration by Python ElementTree?

Time:09-16

I am writing a xml to a string, here's my code to include the declaration

updatedxml = ET.tostring(root, encoding="utf8", method="xml").decode()

Output (Newline is added after the declaration)

<?xml version='1.0' encoding='utf8'?>
<manifest>...</manifest>

updatedxml later is serialized using json dumps before it parsed to a JSON

print(json.dumps(updatedxml))

and there is a "\n" in the output, any pythonic way to get rid of it?

"<?xml version='1.0' encoding='utf8'?>\n<manifest>...</manifest>"

CodePudding user response:

If you want to remove all newlines you can just use python's string.replace()

print(json.dumps(updatedxml).replace("\n", ""))

To remove the newline just before <manifest> but retain all others, fetch it's index then exclude that character from the output

dump = json.dumps(updatedxml)
idx = dump.index("\n<manifest>")
print(dump[:idx]   dump[idx 1:])

Since you asked for a pythonic way, I supposed you could use list comprehension, though the above is likely both faster and much easier to read.

dump = json.dumps(updatedxml)
print("".join([char for i, char in enumerate(dump) if i != dump.index("\n<manifest>")]))

CodePudding user response:

So the reason why this happens is because of the below ElementTree implementation of toString method which internally calls the write method as shown below:

def tostring(element, encoding=None, method=None, *, short_empty_elements=True):
  
    stream = io.StringIO() if encoding == 'unicode' else io.BytesIO()
    ElementTree(element).write(stream, encoding, method=method, short_empty_elements=short_empty_elements)
    return stream.getvalue()


def write(self, file_or_filename, encoding=None, xml_declaration=None, default_namespace=None, method=None, *,
    
    if not method:
        method = "xml"
    elif method not in _serialize:
        raise ValueError("unknown method %r" % method)
    if not encoding:
        if method == "c14n":
            encoding = "utf-8"
        else:
            encoding = "us-ascii"
    enc_lower = encoding.lower()
    with _get_writer(file_or_filename, enc_lower) as write:
        if method == "xml" and (xml_declaration or (xml_declaration is None and enc_lower not in ("utf-8", "us-ascii", "unicode"))):
            declared_encoding = encoding
            if enc_lower == "unicode":
                # Retrieve the default encoding for the xml declaration
                import locale
                declared_encoding = locale.getpreferredencoding()
            write("<?xml version='1.0' encoding='%s'?>\n" % (declared_encoding,))
        if method == "text":
            _serialize_text(write, self._root)
        else:
            qnames, namespaces = _namespaces(self._root, default_namespace)
            serialize = _serialize[method]
            serialize(write, self._root, qnames, namespaces, short_empty_elements=short_empty_elements)

You can see that the "\n" get written by this write method.

The easiest way is to remove the "\n" afterwards.

print("".join(json.dumps(updatedxml).split("\\n")))

If you only want to replace the first instance of "\n", then do the below:

print(json.dumps(updatedxml).replace("\\n", "", 1))
  • Related