I am writing a xml to a string, here's my code to include the declaration
updatedxml = ET.tostring(root, encoding="utf8", method="xml").decode()
Output (Newline is added after the declaration)
<?xml version='1.0' encoding='utf8'?>
<manifest>...</manifest>
updatedxml
later is serialized using json dumps
before it parsed to a JSON
print(json.dumps(updatedxml))
and there is a "\n" in the output, any pythonic way to get rid of it?
"<?xml version='1.0' encoding='utf8'?>\n<manifest>...</manifest>"
CodePudding user response:
If you want to remove all newlines you can just use python's string.replace()
print(json.dumps(updatedxml).replace("\n", ""))
To remove the newline just before <manifest>
but retain all others, fetch it's index then exclude that character from the output
dump = json.dumps(updatedxml)
idx = dump.index("\n<manifest>")
print(dump[:idx] dump[idx 1:])
Since you asked for a pythonic way, I supposed you could use list comprehension, though the above is likely both faster and much easier to read.
dump = json.dumps(updatedxml)
print("".join([char for i, char in enumerate(dump) if i != dump.index("\n<manifest>")]))
CodePudding user response:
So the reason why this happens is because of the below ElementTree implementation of toString method which internally calls the write method as shown below:
def tostring(element, encoding=None, method=None, *, short_empty_elements=True):
stream = io.StringIO() if encoding == 'unicode' else io.BytesIO()
ElementTree(element).write(stream, encoding, method=method, short_empty_elements=short_empty_elements)
return stream.getvalue()
def write(self, file_or_filename, encoding=None, xml_declaration=None, default_namespace=None, method=None, *,
if not method:
method = "xml"
elif method not in _serialize:
raise ValueError("unknown method %r" % method)
if not encoding:
if method == "c14n":
encoding = "utf-8"
else:
encoding = "us-ascii"
enc_lower = encoding.lower()
with _get_writer(file_or_filename, enc_lower) as write:
if method == "xml" and (xml_declaration or (xml_declaration is None and enc_lower not in ("utf-8", "us-ascii", "unicode"))):
declared_encoding = encoding
if enc_lower == "unicode":
# Retrieve the default encoding for the xml declaration
import locale
declared_encoding = locale.getpreferredencoding()
write("<?xml version='1.0' encoding='%s'?>\n" % (declared_encoding,))
if method == "text":
_serialize_text(write, self._root)
else:
qnames, namespaces = _namespaces(self._root, default_namespace)
serialize = _serialize[method]
serialize(write, self._root, qnames, namespaces, short_empty_elements=short_empty_elements)
You can see that the "\n" get written by this write method.
The easiest way is to remove the "\n" afterwards.
print("".join(json.dumps(updatedxml).split("\\n")))
If you only want to replace the first instance of "\n", then do the below:
print(json.dumps(updatedxml).replace("\\n", "", 1))