Home > Blockchain >  parseString not working for me in xml.sax (Python)
parseString not working for me in xml.sax (Python)

Time:06-17

I need to validate xml but the code comes in a variable (str), not from a file.

So I figured this would be easy to do with xml.sax. But I can't get it to work for me. It works fine when parsing a file, but I get a strange error when parsing a string.

Here's my test-code:

from xml.sax import make_parser, parseString
import os

filename = os.path.join('.', 'data', 'data.xml')
xmlstr = "<note>\n<to>Mary</to>\n<from>Jane</from>\n<heading>Reminder</heading>\n<body>Go to the zoo</body>\n</note>"


def parsefile(file):
    parser = make_parser()
    parser.parse(file)


def parsestr(xmlstr):
    parser = make_parser()
    parseString(xmlstr.encode('utf-8'), parser)


try:
    parsefile(filename)
    print("%s is well-formed" % filename)
except Exception as e:
    print("%s is NOT well-formed! %s" % (filename, e))


try:
    parsestr(xmlstr)
    print("%s is well-formed" % ('xml string'))
except Exception as e:
    print("%s is NOT well-formed! %s" % ('xml string', e))

When executing the script, I get this:

./data/data.xml is well-formed
xml string is NOT well-formed! 'ExpatParser' object has no attribute 'processingInstruction'

What am I missing?

CodePudding user response:

The second argument to parseString is supposed to be a ContentHandler, not a parser. Because you're passing in the wrong type of object, it doesn't have the expected methods.

You're expected to subclass ContentHandler and then handle the SAX events as necessary. In this case, you're not actually trying to extract any information from the document, so you could use the base ContentHandler class:

from xml.sax import parseString, SAXParseException
from xml.sax.handler import ContentHandler

xmlstr = "<note>\n<to>Mary</to>\n<from>Jane</from>\n<heading>Reminder</heading>\n<body>Go to the zoo</body>\n</note>"

try:
    parseString(xmlstr, ContentHandler())
    print("document is well formed")
except SAXParseException as err:
    print("document is not well-formed:", err)
  • Related