Home > Blockchain >  Python XML to CSV - problem with property
Python XML to CSV - problem with property

Time:10-09

Today i try to convert simple xml to csv file. XML:

<cars>
<car>
<id>1234</id>
<name>Name Of Car</name>
<description>Nice Car! </description>
<image>http://www.images.com</image>
<cost>432</cost>
<cat>roadster</cat>
<property name="Url">someurl</property>
<property name="Color">red</property>
<property name="Produce">fiat</property>
<property name="SystemID">15</property>
</car>
</cars>
tree = ET.parse("test.xml")
root = tree.getroot()
Resident_data = open('ResidentData.csv', 'w', encoding='utf-8-sig')
csvwriter = csv.writer(Resident_data)
resident_head = []
count = 0
start = 0
for member in root.findall('car'):
    offers = []
    if count == 0:
        resident_head.append("ID")
        resident_head.append("NAME")
        resident_head.append("DESC")
        resident_head.append("IMG")
        resident_head.append("PRICE")
        resident_head.append("TYPE")
        resident_head.append("URL")
        resident_head.append("COLOR")
        resident_head.append("PRODUCE")
        resident_head.append("SYSID")
        
        csvwriter.writerow(resident_head)
        count = count   1

    ID = member.find('id').text
    offers.append(ID)
    NAME = member.find('name').text
    offers.append(NAME)
    DESC = member.find('description').text
    offers.append(DESC)
    IMG = member.find('image').text
    offers.append(IMG)
    PRICE = member.find('cost').text
    offers.append(PRICE)
    CAT = member.find('cat').text
    offers.append(CAT)
    
    csvwriter.writerow(offers)
    start = start   1
    print(start)

Resident_data.close()

Output: |ID|NAME|DESC|IMG|PRICE|TYPE|URL|COLOR|PRODUCE|SYSID| |--|----|----|---|-----|----|---|-----|-------|-----| |1234|Name Of car|Nice Car!|http:..|432|roadster|

All working good but for this moment, but i don't know how to loop last columns, because here is <property name="x">. Also sometimes SYSID or COLOR is none (empty in system)

Ps. In preview my table looks ok...

Thanks! :)

CodePudding user response:

See below (The code builds the csv headers dynamically based on the xml data)

import xml.etree.ElementTree as ET
import csv


xml = '''<?xml version="1.0" encoding="UTF-8"?>
<cars>
   <car>
      <id>1234</id>
      <name>Name Of Car</name>
      <description>Nice Car!</description>
      <image>http://www.images.com</image>
      <cost>432</cost>
      <cat>roadster</cat>
      <property name="Url">someurl</property>
      <property name="Color">red</property>
      <property name="Produce">fiat</property>
      <property name="SystemID">15</property>
   </car>
   <car>
      <id>1235</id>
      <name>Name Of Car1</name>
      <description>Nice Car!1</description>
      <image>http://www.images.com</image>
      <cost>435</cost>
      <cat>roadster45</cat>
      <property name="Url">someurl34</property>
      <property name="Color">green</property>
      <property name="Produce">nest</property>
      <property name="SystemID">45</property>
   </car>
</cars>'''
root = ET.fromstring(xml)
headers = []
data = []
for idx,car in enumerate(root.findall('.//car')):
  if idx == 0:
    for p in list(car):
      if p.tag != 'property':
        headers.append(p.tag.upper())
      else:
        headers.append(p.attrib['name'].upper())
  data.append([p.text for p in list(car)])

with open('out.csv', 'w') as f:
    wr = csv.writer(f)
    wr.writerow(headers)
    for entry in data:
      wr.writerow(entry)

out.csv

ID,NAME,DESCRIPTION,IMAGE,COST,CAT,URL,COLOR,PRODUCE,SYSTEMID
1234,Name Of Car,Nice Car!,http://www.images.com,432,roadster,someurl,red,fiat,15
1235,Name Of Car1,Nice Car!1,http://www.images.com,435,roadster45,someurl34,green,nest,45

CodePudding user response:

If there's possible extra fields or some fields missing, it could be a good option to use DictWriter. I your example <cars> node (which is root node) contains only <car> child nodes, so we can just iterate over root node. If your real XML is more complex, you can use .iterparse() with simple .//car XPath expression.

Code:

import xml.etree.ElementTree as ET
from csv import DictWriter

xml = ...
root = ET.fromstring(xml)  # replace with initialization from file (if needed)

with open("result.csv", "w", newline="") as f:
    writer = None
    for car_node in root:
        car = {node.get("name") or node.tag: node.text for node in car_node}
        if not writer:
            writer = DictWriter(f, car, extrasaction="ignore")
            writer.writeheader()
        writer.writerow(car)
  • Related