Home > Back-end >  LOAD XML command results in missing fields/nodes
LOAD XML command results in missing fields/nodes

Time:10-01

tldr; When I execute the LOAD XML statement below it appears successful but for some reason the lessonName field is always empty. All of the other fields from the XML doc load properly but for some reason this one is skipped.

I am trying to load an XML document into an SQL table using MySQL. There are no embedded tags in the field in question, it is not trying to read attributes of the tag; it is a straight forward tag that contains a string. It is the lessonName field being skipped. Below is my XML document:

<?xml version = "1.0"?>

<Lesson:LessonContent xmlns:Lesson = "UniqueURIHERE:Lessonv1.0.xsd">

    <row>
        <courseID>COMP466</courseID>
    
        <unitNumber>1</unitNumber>
    
        <unitName>Intro to HTML</unitName>
    
        <lessonName>Introduction</lessonName>

        <!--The following text '&lt;p&gt;' paragrah goes here '&lt;/p&gt;' should enclose each 
        paragraph of the lesson-->
        <textBody>&lt;p&gt; A bunch of stuff related to lesson content is written here...&lt;/p&gt;
        </textBody>
    </row>
</Lesson:LessonContent>

This is my XSD

<?xml version = '1.0'?>

<schema xmlns = "http://www.w3.org/2001/XMLSchema"
        xmlns:Lesson = "UniqueURIHERE:Lessonv1.0.xsd"
        targetNamespace = "UniqueURIHERE:Lessonv1.0.xsd">
        
    <element name = 'LessonContent' type = 'Lesson:root'/>
    
    <complexType name = 'root'>
        <sequence>
            <element name = 'row' type = 'Lesson:row'/>
        </sequence>
    </complexType>
    
    <complexType name = 'row'>
        <sequence>
            <element name = 'courseID' type = 'string'/>
            <element name = 'unitNumber' type = 'float'/>
            <element name = 'unitName' type = 'string'/>
            <element name = 'lessonName' type = 'string'/>
            <element name = 'textBody' type = 'string'/>
        </sequence>
    </complexType>
    
</schema>

This is the command I use to load the XML document:

LOAD XML INFILE 'Unit1_introToHTML.xml' INTO TABLE LESSON ROWS IDENTIFIED BY '<row>';

This is the columns for the table I am loading it into

SHOW COLUMNS FROM LESSON;
 ------------ -------------- ------ ----- --------- ------- 
| Field      | Type         | Null | Key | Default | Extra |
 ------------ -------------- ------ ----- --------- ------- 
| courseID   | varchar(50)  | NO   | PRI | NULL    |       |
| unitName   | varchar(255) | YES  |     | NULL    |       |
| lessonName | varchar(255) | NO   | PRI | NULL    |       |
| unitNumber | int(11)      | NO   |     | NULL    |       |
| textBody   | text         | NO   |     | NULL    |       |
 ------------ -------------- ------ ----- --------- ------- 
5 rows in set (0.006 sec)

When I execute the LOAD XML statement it appears successful but for some reason the lessonName field is always empty. All of the other fields load properly but for some reason this one is skipped.

I spent quite a bit trying to look for answers already here on SO but all I could find were people trying to load nodes with embedded nodes or with attributes that weren't working (clearly not my problem). There are no error codes or anything to even point me in the right direction when I execute the command, it appears successful. I get the following message back confirming success:

Query OK, 1 row affected (0.002 sec)

Records: 1 Deleted: 0 Skipped: 0 Warnings: 0

When I perform a 'SELECT * FROM LESSON;' to confirm my record was loaded, the lessonName column is empty but all other fields are as I would expect.

CodePudding user response:

So I fixed my issue but I still don't understand the problem if anyone out there wants to try to enlighten. I removed the comments after the lessonName tag in the XML document and it loaded in fine. For some reason the comment was causing the tag to be skipped?

If anyone out there understands why this was a problem I'd love to hear it! Seems silly I cant use comments inside of XML documents that I need to load to an SQL table? I changed the XML to the below (literally just removed the comment) and now it's perfect.

<?xml version = "1.0"?>

<Lesson:LessonContent xmlns:Lesson = "UniqueURIHERE:Lessonv1.0.xsd">

    <row>
        <courseID>COMP466</courseID>
    
        <unitNumber>1</unitNumber>
    
        <unitName>Intro to HTML</unitName>
    
        <lessonName>Introduction</lessonName>

        <textBody>&lt;p&gt; A bunch of stuff related to lesson content is written here...&lt;/p&gt;
        </textBody>
    </row>
</Lesson:LessonContent>

CodePudding user response:

I did a quick search for bugs related to the LOAD XML feature and my suspicions were confirmed. The LOAD XML feature probably does not (well, did not originally) use a proper XML parser to parse the input. See these bugs as evidence:

https://bugs.mysql.com/bug.php?id=86916

https://bugs.mysql.com/bug.php?id=84245

Conclusion: This is probably a bug in the MySQL XML LOAD feature.

  • Related