Home > database >  Don't encode Element text object using Python ElementTree
Don't encode Element text object using Python ElementTree

Time:03-04

I'm trying to use HTML data inside an the text node of an element, but it gets encoded as if it were meant to not be HTML data.

Here is an MWE:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

p = ET.Element('p')
p.text = data
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

The output is...

<p>&lt;a href="https://example.com"&gt;Example data gained from elsewhere.&lt;/a&gt;</p>

What I intended is...

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>

CodePudding user response:

You can parse the HTML string into an ElementTree object and append it to the DOM:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

p = ET.Element('p')
p.append(ET.fromstring(data))
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

CodePudding user response:

What you are doing is wrong. You are assigning p.text = data, which basically considers the node to be text content. Its quite obvious the text is escaped. You have to add it as a child. like below:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

d = ET.fromstring(data)
p = ET.Element('p')

p.append(d)
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

Giving output

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>
  • Related