Home > Software design >  Parse XML to CSV with python
Parse XML to CSV with python

Time:02-03

I need to parse some XML to CSV. I am struggling getting the 'record' attribute to iterate. The code below can pull out the allocation text. How do I get the record product-id?

    import xml.etree.ElementTree as ET
    mytree = ET.parse('Salesforce_01_30_2023.xml')
    myroot = mytree.getroot()
    print(myroot)
    
    for x in myroot.findall('record'):
        product = myroot.attrib
        inventory = x.find('allocation').text
        print(product, inventory)

XML

<?xml version="1.0" encoding="UTF-8"?>
        <records>
            <record product-id="99124">
                <allocation>15</allocation>
                <allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
                <perpetual>false</perpetual>
                <preorder-backorder-handling>none</preorder-backorder-handling>
                <ats>15</ats>
            </record>
            <record product-id="011443">
                <allocation>0</allocation>
                <allocation-timestamp>2023-01-30T15:03:39.598Z</allocation-timestamp>
                <perpetual>false</perpetual>
                <preorder-backorder-handling>none</preorder-backorder-handling>
                <ats>0</ats>
            </record>

CodePudding user response:

To get product-id number you can use .attrib["product-id"]:

import xml.etree.ElementTree as ET

mytree = ET.parse('Salesforce_01_30_2023.xml')
myroot = mytree.getroot()

for product in myroot.findall('record'):
    inventory = product.find('allocation').text
    print(product.attrib['product-id'], inventory)

Prints:

99124 15
011443 0

CodePudding user response:

Option 1: You can use pandas DataFrame read_xml() and to_csv():

import pandas as pd

df = pd.read_xml("prod_id.xml", xpath=".//record")
df.to_csv('prod.csv')
print(df.to_string())

Output:

   product-id  allocation      allocation-timestamp  perpetual preorder-backorder-handling  ats
0       99124          15  2023-01-30T15:03:39.598Z      False                        none   15
1       11443           0  2023-01-30T15:03:39.598Z      False                        none    0

CSV:

,product-id,allocation,allocation-timestamp,perpetual,preorder-backorder-handling,ats
0,99124,15,2023-01-30T15:03:39.598Z,False,none,15
1,11443,0,2023-01-30T15:03:39.598Z,False,none,0

Option 2, if you prefere the xml.etree.ElementTree. xml attribute values can be searched by .get():

import xml.etree.ElementTree as ET
    
tree = ET.parse('prod_id.xml')
root = tree.getroot()
    
for elem in root.iter():
    # print(elem.tag, elem.attrib, elem.text)
    if elem.tag == "record":
        print("Product-id:",elem.get('product-id'))

Output:

Product-id: 99124
Product-id: 011443
  • Related