Home > Software engineering >  Python - XML find nodes then remove
Python - XML find nodes then remove

Time:11-03

I have the following XML file:

<customer>
    <id>807997287</id>
    <dateCreated>2022-11-13T00:00:00Z</dateCreated>
    <status>Created</status>
    <client>
        <id>807997223</id>
        <firstname>Jeff</firstname>
        <lastname>Smith</lastname>
        <address>
            <id>4388574</id>
            <home>
                <addressLine1>Address Line 1</addressLine1>
                <addressLine2>Address Line 2</addressLine2>
                <addressLine3>Address Line 3</addressLine3>
                <addressLine4>Address Line 4</addressLine4>
                <postCode>XXX ZZZ</postCode>
            </home>
            <telephoneNumbers>
                <telephone>
                    <id>807997230</id>
                    <areaCode>01123</areaCode>
                    <phoneNumber>123123</phoneNumber>
                    <usage>Work</usage>
                </telephone>
                <telephone>
                    <id>807997232</id>
                    <areaCode>01564</areaCode>
                    <phoneNumber>123123</phoneNumber>
                    <usage>Home</usage>
                </telephone>
            </telephoneNumbers>
        </address>
    </client>
</customer>

And I need to be able to remove all the ID nodes.

I have tried the following code, but it doesn't A) find all the IDs B) doesn't remove them

import xml.etree.ElementTree as ET

tree = ET.ElementTree()
tree.parse('test.xml')
root = tree.getroot()
ids = root.findall(".//id")

for item in ids:
    ids.remove(item)
    print(ET.tostring(item))

t = ET.ElementTree(root)
t.write("output.xml")

The commandline output is:

b'<id>807997287</id>\n    '
b'<id>4388574</id>\n            '
b'<id>807997232</id>\n                '

And the output.xml remains the same.

Can anyone help point me in the right direction with this one please?

CodePudding user response:

You are probably looking for something like

##for elem in root.findall('.//*[id]'):
EDIT
for elem in root.findall('.//id/..'):
    id = elem.find('.//id')
    elem.remove(id)
print(ET.tostring(root).decode())

Output should be your expected output.

  • Related