If we supose the following XML file:
<XML Data>
<Record>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="C"></Product>
</Service>
</Record>
<Record>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="Y"></Product>
</Service>
</Record>
<Record>
<Service>
<Product id="U"></Product>
</Service>
</Record>
</XML Data>
As you can see, each record shows a single client but without an unique identificator. Each service has multiple products.
I want to get all products that have been sold with product A. Therefore, I am trying to get a list like this:
ServiceID
B
C
Y
I've been using:
import xml.etree.ElementTree as ET
CodePudding user response:
You can select elements based on an attribute via [@attrib='value']
according to the official documentation. When testing this i exchanged your tag <XML Data>
and </XML Data>
with <Data>
and </Data>
. Example code:
from xml.etree import ElementTree as ET
data = ET.parse(r"/path/to/your/input.xml")
root = data.getroot()
for product in root.findall("./Record/Service/Product[@id='A']"):
print(product.attrib["id"])
print(product.text)
Edit
After reading your question again i noticed that you first want to check whether a product with id A exists within a Service, and only then store the IDs (uniquely & sorted), so i adapted the code:
from xml.etree import ElementTree as ET
data = ET.parse(r"/path/to/your/input.xml")
root = data.getroot()
product_ids = set()
for service in root.findall("./Record/Service"):
list_contains_a = False
# iterate once to identify if list contains product with ID = 'A'
for product in service.findall("./Product"):
if product.attrib["id"] == "A":
list_contains_a = True
# if list contains product with ID = 'A', iterate second time and fetch IDs
if list_contains_a:
for product in service.findall("./Product"):
if product.attrib["id"] == "A":
continue
# add to set to prevent duplicates
product_ids.add(product.attrib["id"])
ret_list = ["ServiceID"] list(sorted(product_ids))
print(ret_list)