Home > database >  Python XML ElementTree not reading node with &
Python XML ElementTree not reading node with &

Time:09-26

I have an XML, with one of the nodes having '&' within a string:

<uid>JAMES&001</uid>

now, when I try to read the whole xml using the following code:

tree = et.parse(fileName)
root = tree.getroot()
ids = root.findall("uid")

I get the error on the link of the above-mentioned node:

xml.etree.ElelmentTree.ParseError: not well-formed (invalid token): line17, column 21

The code works fine on other instances where there is no '&'. I guess it's breaking the string.

Can it be fixed with encoding? How? I searched through other questions but couldn't find an answer.

TIA

CodePudding user response:

You need to sanitize your xml first since it isn't well formed.

You need to replace the offending & - something like .replace("&", "&amp;")

One way to use it:

with open(fileName, 'r ') as f:
        read_data = f.read()
        doc = ET.fromstring(read_data.replace("&", "&amp;"))
        print(doc.find('./uid').text)

Output, given your sample, should be

JAMES&001
  • Related