Home > Software engineering >  How to sort an xml by a nested child element text value using etree in Python
How to sort an xml by a nested child element text value using etree in Python

Time:03-25

I've seen variations of this question answered numerous times (Sorting XML in python etree, Sorting xml values using etree) yet cant seem to adapt those answers to my question. I am trying to sort an imported xml file by a specific sub elements tag, in this instance it's by the "id" tag. Below is the xml in question:

INPUT:

    <bookstore Location="New York">              
        <Genre type="Fiction">
            <name>Fiction</name>
            <id>4</id>
            <pages>300</pages>
            </Genre>
        <Genre type="Fiction">
            <name>Fictional Fiction</name>
            <id>2</id>
            <pages>500</pages>
        </Genre>
        <Genre type="Horror">
            <name>Horrors</name>
            <id>1</id>
            <pages>450</pages>
        </Genre>
        <Genre type="Horror">
            <name>Horrendous Horror</name>
            <id>3</id>
            <pages>20</pages>
        </Genre>
        <Genre type="Comedy">
            <name>Comedic Comedy</name>
            <id>0</id>
            <pages>1</pages>
        </Genre>
    </bookstore>

I want to organize all the Genre elements by their child element "id". This is the output I'm going for:

OUTPUT:

    <bookstore Location="New York">              
        <Genre type="Comedy">
            <name>Comedic Comedy</name>
            <id>0</id>
            <pages>1</pages>
        </Genre>
        <Genre type="Horror">
            <name>Horrors</name>
            <id>1</id>
            <pages>450</pages>
        </Genre>
        <Genre type="Fiction">
            <name>Fictional Fiction</name>
            <id>2</id>
            <pages>500</pages>
        </Genre>
        <Genre type="Horror">
            <name>Horrendous Horror</name>
            <id>3</id>
            <pages>20</pages>
        </Genre> 
        <Genre type="Fiction">
            <name>Fiction</name>
            <id>4</id>
            <pages>300</pages>
        </Genre>
    </bookstore>

This is the code I've tried:

    def sortchildrenby(parent):
    parent[:] = sorted(parent, key=lambda child: child.tag == 'id')

    filename = "Example.xml"
    tree = ET.parse(filename)
    root = tree.getroot()                      
    attr = "type"
    for elements in root:
        sortchildrenby(elements)
    tree.write("exampleORGANIZED.xml")

Which results the following xml:

    <bookstore Location="New York">              
        <Genre type="Fiction">
            <name>Fiction</name>
            <pages>300</pages>
            <id>4</id>
            </Genre>
        <Genre type="Fiction">
            <name>Fictional Fiction</name>
            <pages>500</pages>
        <id>2</id>
            </Genre>
        <Genre type="Horror">
            <name>Horrors</name>
            <pages>450</pages>
        <id>1</id>
            </Genre>
        <Genre type="Horror">
            <name>Horrendous Horror</name>
            <pages>20</pages>
        <id>3</id>
            </Genre>
        <Genre type="Comedy">
            <name>Comedic Comedy</name>
            <pages>1</pages>
        <id>0</id>
            </Genre>
    </bookstore>

The ID's were shifted downward and did not re-sort in ascending order.

CodePudding user response:

Pass the whole root into method without iteration since you need to sort underlying <Genre> elements not each individual one. Also, adjust method to sort by element text not a boolean expression:

def sortchildrenby(parent, attr):
    parent[:] = sorted(parent, key=lambda child: child.find(attr).text)

tree = ET.parse("Input.xml")
root = tree.getroot()
                    
sortchildrenby(root, "id")
    
ET.indent(tree, space="\t", level=0)   # PRETTY PRINT (ADDED Python 3.9)
tree.write("Output.xml")

Output

<bookstore Location="New York">
    <Genre type="Comedy">
        <name>Comedic Comedy</name>
        <id>0</id>
        <pages>1</pages>
    </Genre>
    <Genre type="Horror">
        <name>Horrors</name>
        <id>1</id>
        <pages>450</pages>
    </Genre>
    <Genre type="Fiction">
        <name>Fictional Fiction</name>
        <id>2</id>
        <pages>500</pages>
    </Genre>
    <Genre type="Horror">
        <name>Horrendous Horror</name>
        <id>3</id>
        <pages>20</pages>
    </Genre>
    <Genre type="Fiction">
        <name>Fiction</name>
        <id>4</id>
        <pages>300</pages>
    </Genre>
</bookstore>
  • Related