Home > database >  Write a literal newline in ElementTree Attribute
Write a literal newline in ElementTree Attribute

Time:05-03

subelement = SubElement(xml_tree, "image")
stream = BytesIO()
c.image.save(stream, format="PNG")
png = encodebytes(stream.getvalue()).decode("utf-8")
subelement.set("xlink:href", f"data:image/png;base64,{png}")

I am doing a very basic writing of an svg image element and attempting to conform to RFC 2045 which requires that I provide base64 code with lineends within the file.

I get the idiomized version:

<image xlink:href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAApUAAALUCAIAAADVN145AAAKMWlDQ1BJQ0MgUHJvZmlsZQAAeJyd&#10;...

The written data replaces the \n with &#10;. I need to have ElementTree literally write the \n to disk. Am I missing something? Or is there a workaround?

CodePudding user response:

I think you have the correct result with the XML entity representation of newline character. You're serializing data as XML so you need to encode the value in the way XML defines. So you wrap your image data twice - first with base64 encoding, then with XML encoding (which is incidentally 1:1 for most characters you care about).

Actually, if you put the newline character itself into the attribute, the XML parser could probably normalize it to a space when reading.

CodePudding user response:

That RFC is about MIME encoding, and I think you are trying to be too literal in implementing those formatting rules when encoding in XML format for that attribute.

Note that many implementations may elect to encode the local representation of various content types directly rather than converting to canonical form first, encoding, and then converting back to local representation. In particular, this may apply to plain text material on systems that use newline conventions other than a CRLF terminator sequence. Such an implementation optimization is permissible, but only when the combined canonicalization-encoding step is equivalent to performing the three steps separately.

Similarly, a CRLF sequence in the canonical form of the data obtained after base64 decoding must be converted to a quoted- printable hard line break, but ONLY when converting text data.

  • Related