I need to validate xml but the code comes in a variable (str), not from a file.
So I figured this would be easy to do with xml.sax. But I can't get it to work for me. It works fine when parsing a file, but I get a strange error when parsing a string.
Here's my test-code:
from xml.sax import make_parser, parseString
import os
filename = os.path.join('.', 'data', 'data.xml')
xmlstr = "<note>\n<to>Mary</to>\n<from>Jane</from>\n<heading>Reminder</heading>\n<body>Go to the zoo</body>\n</note>"
def parsefile(file):
parser = make_parser()
parser.parse(file)
def parsestr(xmlstr):
parser = make_parser()
parseString(xmlstr.encode('utf-8'), parser)
try:
parsefile(filename)
print("%s is well-formed" % filename)
except Exception as e:
print("%s is NOT well-formed! %s" % (filename, e))
try:
parsestr(xmlstr)
print("%s is well-formed" % ('xml string'))
except Exception as e:
print("%s is NOT well-formed! %s" % ('xml string', e))
When executing the script, I get this:
./data/data.xml is well-formed
xml string is NOT well-formed! 'ExpatParser' object has no attribute 'processingInstruction'
What am I missing?
CodePudding user response:
The second argument to parseString
is supposed to be a ContentHandler
, not a parser. Because you're passing in the wrong type of object, it doesn't have the expected methods.
You're expected to subclass ContentHandler
and then handle the SAX events as necessary. In this case, you're not actually trying to extract any information from the document, so you could use the base ContentHandler
class:
from xml.sax import parseString, SAXParseException
from xml.sax.handler import ContentHandler
xmlstr = "<note>\n<to>Mary</to>\n<from>Jane</from>\n<heading>Reminder</heading>\n<body>Go to the zoo</body>\n</note>"
try:
parseString(xmlstr, ContentHandler())
print("document is well formed")
except SAXParseException as err:
print("document is not well-formed:", err)