Home > other >  How to remove the newline after xml declaration by Python ElementTree?
How to remove the newline after xml declaration by Python ElementTree?


I am writing a xml to a string, here's my code to include the declaration

updatedxml = ET.tostring(root, encoding="utf8", method="xml").decode()

Output (Newline is added after the declaration)

<?xml version='1.0' encoding='utf8'?>

updatedxml later is serialized using json dumps before it parsed to a JSON


and there is a "\n" in the output, any pythonic way to get rid of it?

"<?xml version='1.0' encoding='utf8'?>\n<manifest>...</manifest>"

CodePudding user response:

If you want to remove all newlines you can just use python's string.replace()

print(json.dumps(updatedxml).replace("\n", ""))

To remove the newline just before <manifest> but retain all others, fetch it's index then exclude that character from the output

dump = json.dumps(updatedxml)
idx = dump.index("\n<manifest>")
print(dump[:idx]   dump[idx 1:])

Since you asked for a pythonic way, I supposed you could use list comprehension, though the above is likely both faster and much easier to read.

dump = json.dumps(updatedxml)
print("".join([char for i, char in enumerate(dump) if i != dump.index("\n<manifest>")]))

CodePudding user response:

So the reason why this happens is because of the below ElementTree implementation of toString method which internally calls the write method as shown below:

def tostring(element, encoding=None, method=None, *, short_empty_elements=True):
    stream = io.StringIO() if encoding == 'unicode' else io.BytesIO()
    ElementTree(element).write(stream, encoding, method=method, short_empty_elements=short_empty_elements)
    return stream.getvalue()

def write(self, file_or_filename, encoding=None, xml_declaration=None, default_namespace=None, method=None, *,
    if not method:
        method = "xml"
    elif method not in _serialize:
        raise ValueError("unknown method %r" % method)
    if not encoding:
        if method == "c14n":
            encoding = "utf-8"
            encoding = "us-ascii"
    enc_lower = encoding.lower()
    with _get_writer(file_or_filename, enc_lower) as write:
        if method == "xml" and (xml_declaration or (xml_declaration is None and enc_lower not in ("utf-8", "us-ascii", "unicode"))):
            declared_encoding = encoding
            if enc_lower == "unicode":
                # Retrieve the default encoding for the xml declaration
                import locale
                declared_encoding = locale.getpreferredencoding()
            write("<?xml version='1.0' encoding='%s'?>\n" % (declared_encoding,))
        if method == "text":
            _serialize_text(write, self._root)
            qnames, namespaces = _namespaces(self._root, default_namespace)
            serialize = _serialize[method]
            serialize(write, self._root, qnames, namespaces, short_empty_elements=short_empty_elements)

You can see that the "\n" get written by this write method.

The easiest way is to remove the "\n" afterwards.


If you only want to replace the first instance of "\n", then do the below:

print(json.dumps(updatedxml).replace("\\n", "", 1))
  • Related