I need to parse some XML to CSV. I am struggling getting the 'record' attribute to iterate. The code below can pull out the allocation text. How do I get the record product-id?
import xml.etree.ElementTree as ET
mytree = ET.parse('Salesforce_01_30_2023.xml')
myroot = mytree.getroot()
print(myroot)
for x in myroot.findall('record'):
product = myroot.attrib
inventory = x.find('allocation').text
print(product, inventory)
XML
<?xml version="1.0" encoding="UTF-8"?>
<records>
<record product-id="99124">
<allocation>15</allocation>
<allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
<perpetual>false</perpetual>
<preorder-backorder-handling>none</preorder-backorder-handling>
<ats>15</ats>
</record>
<record product-id="011443">
<allocation>0</allocation>
<allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
<perpetual>false</perpetual>
<preorder-backorder-handling>none</preorder-backorder-handling>
<ats>0</ats>
</record>
CodePudding user response:
To get product-id
number you can use .attrib["product-id"]
:
import xml.etree.ElementTree as ET
mytree = ET.parse('Salesforce_01_30_2023.xml')
myroot = mytree.getroot()
for product in myroot.findall('record'):
inventory = product.find('allocation').text
print(product.attrib['product-id'], inventory)
Prints:
99124 15
011443 0
CodePudding user response:
Option 1: You can use pandas
DataFrame read_xml() and to_csv():
import pandas as pd
df = pd.read_xml("prod_id.xml", xpath=".//record")
df.to_csv('prod.csv')
print(df.to_string())
Output:
product-id allocation allocation-timestamp perpetual preorder-backorder-handling ats
0 99124 15 2023-01-30T15:03:39.598Z False none 15
1 11443 0 2023-01-30T15:03:39.598Z False none 0
CSV:
,product-id,allocation,allocation-timestamp,perpetual,preorder-backorder-handling,ats
0,99124,15,2023-01-30T15:03:39.598Z,False,none,15
1,11443,0,2023-01-30T15:03:39.598Z,False,none,0
Option 2, if you prefere the xml.etree.ElementTree.
xml attribute values can be searched by .get()
:
import xml.etree.ElementTree as ET
tree = ET.parse('prod_id.xml')
root = tree.getroot()
for elem in root.iter():
# print(elem.tag, elem.attrib, elem.text)
if elem.tag == "record":
print("Product-id:",elem.get('product-id'))
Output:
Product-id: 99124
Product-id: 011443