Home > Software design >  Python to XML get multiple child
Python to XML get multiple child

Time:07-25

I am working with python, I need to move my xml to csv file, right now I have an issue with the tag "customfields" can be more than one tag with that name and inside each customfields tag I need to add fieldName and Value.

My code works but for fieldName and Value variables only store the last tag captured.

My result should be like these

ID. abbreviation active address. fieldName value.
2424 X1C-02220 true. my address Company
2424 X1C-02220 true. my address Org Tiber

But now is looking like these

ID. abbreviation active address. fieldName value.
2424 X1C-02220 true. my address Org Tiber

This is my xml:

<?xml version="1.0" ?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <soapenv:Body>
      <ns1:getListResponse xmlns:ns1="http://service.web.propertyware.realpage.com" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
         <getListReturn xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" soapenc:arrayType="xsd:anyType[824]" xsi:type="soapenc:Array">
            <getListReturn xmlns:ns2="urn:PWServices" xsi:type="ns2:Unit">
               <ID xsi:type="xsd:long">2424</ID>
               <abbreviation xsi:type="xsd:string">X1C-02220</abbreviation>
               <active xsi:type="xsd:boolean">true</active>
               <address xsi:type="xsd:string">my address</address>
               <customFields soapenc:arrayType="ns2:CustomField[129]" xsi:type="soapenc:Array">
                  <customFields xsi:type="ns2:CustomField">
                     <ID xsi:type="xsd:long">0</ID>
                     <clientData xsi:type="soapenc:Array" xsi:nil="true"/>
                     <dataType xsi:type="xsd:string">Text</dataType>
                     <definitionID xsi:type="xsd:long">0</definitionID>
                     <fieldName xsi:type="xsd:string">Company</fieldName>
                     <value xsi:type="xsd:string"/>
                  </customFields>
                    <customFields xsi:type="ns2:CustomField">
                     <fieldName xsi:type="xsd:string">Org</fieldName>
                     <value xsi:type="xsd:string">Tiber</value>
                  </customFields>
                 </customFields>
            </getListReturn>
         </getListReturn>
      </ns1:getListResponse>
   </soapenv:Body>
</soapenv:Envelope>

Here is my code

import xml.etree.ElementTree as ET
import pandas as pd
import csv

root = ET.parse('build.xml')  # replace file name

cols = ["ID","abbreviation","active","address"
        ,"fieldName","value"
        ]
rows = []


branches = root.findall('.//getListReturn/getListReturn')
for branch in branches:
    ID = branch.find('ID').text
    abbreviation= branch.find('abbreviation').text
    active= branch.find('active').text
    address= branch.find('address').text
    customs=branch.findall('.//customFields/customFields')
    for custom in customs:
        fieldName=custom.find('fieldName').text
        value=custom.find('value').text
    
    rows.append({"ID":ID,"abbreviation":abbreviation,
                 "active":active,"address":address
                 ,"fieldName":fieldName,"value":value
                })
df = pd.DataFrame(rows, columns=cols)
df.to_csv('output.csv')  

How to store all the data from "customfields" tag?

Regards

CodePudding user response:

Simply indent the rows.append inside the inner loop to capture all information under child element:

...
    for custom in customs:
        fieldName = custom.find('fieldName').text
        value = custom.find('value').text
    
        rows.append({
            "ID":ID, "abbreviation":abbreviation,
            "active":active, "address":address,
            "fieldName":fieldName, "value":value
        })

Consider also, list/dict comprehensions with a dictionary merge to combine both parent/child levels:

root = ET.parse('build.xml')

rows = [
    {
     **{
        "ID": branch.find('abbreviation').text,
        "abbreviation": branch.find('abbreviation').text,
        "active": branch.find('active').text,
        "address": branch.find('address').text,
        "fieldName":fieldName,
        "value":value
      },
     **{
        "fieldName": custom.find('fieldName').text,
        "value": custom.find('value').text
     }
    }
    
    for branch in root.findall('.//getListReturn/getListReturn')
    for custom in branch.findall('.//customFields/customFields')
]

df = pd.DataFrame(rows, columns=cols)
df.to_csv('output.csv')  
  • Related