Home > Software design >  Why does my XML parser writes "°C" as "°C" in a newly created file?
Why does my XML parser writes "°C" as "°C" in a newly created file?

Time:10-07

I'm using the Etree parser to edit an xml based config file. I'm able to read, find and edit the text I want to change. But when I write the whole data to a new file, the "°C" is written as "°C". I would like that to remain as is, i.e. "°C". Could somebody explain why the parser replaces it like this?

Example: Original line: <parameter name="Temperature" Units=°C>30</parameter>

(Run python script, find "30" and set it to "200". Write the line again to a new file)

Edited line: <parameter name="Temperature" Units=&#176;C>200</parameter>

Could somebody help understand this?

CodePudding user response:

Per the documentation, etree.tostring() outputs ASCII-encoded strings by default, where the ° symbol cannot be represented except as an entity. To specify unicode output, use the encoding parameter.


In [12]: string = '<parameter name="Temperature" Units="°C">30</parameter>'

In [13]: root = etree.fromstring(string)

In [14]: etree.tostring(root)
Out[14]: b'<parameter name="Temperature" Units="&#176;C">30</parameter>'

In [15]: etree.tostring(root, encoding="unicode")
Out[15]: '<parameter name="Temperature" Units="°C">30</parameter>'
  • Related