I have an XML, with one of the nodes having '&' within a string:
<uid>JAMES&001</uid>
now, when I try to read the whole xml using the following code:
tree = et.parse(fileName)
root = tree.getroot()
ids = root.findall("uid")
I get the error on the link of the above-mentioned node:
xml.etree.ElelmentTree.ParseError: not well-formed (invalid token): line17, column 21
The code works fine on other instances where there is no '&'. I guess it's breaking the string.
Can it be fixed with encoding? How? I searched through other questions but couldn't find an answer.
TIA
CodePudding user response:
You need to sanitize your xml first since it isn't well formed.
You need to replace the offending &
- something like .replace("&", "&")
One way to use it:
with open(fileName, 'r ') as f:
read_data = f.read()
doc = ET.fromstring(read_data.replace("&", "&"))
print(doc.find('./uid').text)
Output, given your sample, should be
JAMES&001