Home > Net >  convert xml to csv using python
convert xml to csv using python

Time:09-22

I am learning my way around python and right now I need a little bit of help. I have an XML file from soap api that I am failing at converting to CSV. I managed to get the data with the request library easily. My struggle is converting it to CSV, I end up with headers with no values

My XML Data :

    <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
            <Level3>
                <ResponseStatus>Success</ResponseStatus>
                <ErrorMessage/>
                <Message>20 alert(s) generated for this period</Message>
                <ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
                <Something1>1</Something1>
                <Something2/>
                <Something3/>
                <Something4/>
                <VIP>
                    <MainVIP>
                        <Date>20210616</Date>
                        <RegisteredDate>20210216</RegisteredDate>
                        <Type>YMBA</Type>
                        <TypeDescription>TYPE OF ENQUIRY</TypeDescription>
                        <BusinessName>COMPANY NAME</BusinessName>
                        <ITNumber>987654321</ITNumber>
                        <RegistrationNumber>123456789</RegistrationNumber>
                        <SubscriberNumber>55889977</SubscriberNumber>
                        <SubscriberReference/>
                        <TicketNumber>1122336655</TicketNumber>
                        <SubscriberName>COMPANY NAME 2 </SubscriberName>
                        <CompletedDate>20210615</CompletedDate>
                    </MainVIP>
                </VIP>
                <Something5/>
                <Something6/>
                <Something7/>
                <Something8/>
                <Something9/>
                <PrincipalSomething10/>
                <PrincipalSomething11/>
                <PrincipalSomething12/>
                <PrincipalSomething13/>
                <Something14/>
                <Something15/>
                <Something16/>
                <Something17/>
                <Something18/>
                <PrincipalSomething19/>
                <PrincipalSomething20/>
            </Level3>
        </Level2>
    </soap:Body>
</soap:Envelope>

My python code looks like this :

import xml.etree.ElementTree as ET
import pandas as pd

cols = ['Date', 'RegisteredDate', 'Type',
        'TypeDescription']
rows = []

# parse xml file

xmlparse = ET.parse('xmldata.xml')
root = xmlparse.getroot()
for i in root:
    Date = i.get('Date').text
    RegisteredDate = i.get('RegisteredDate').text
    Type = i.get('Type').text
    TypeDescription = i.get('TypeDescription').text

    rows.append({'Date': Date,
                'RegisteredDate': RegisteredDate,
                'Type': Type,
                'TypeDescription': TypeDescription})

df = pd.DataFrame(rows, columns=cols)
print(df)

df.to_csv('csvdata.csv')

In my approach, I was following the idea from here https://www.geeksforgeeks.org/convert-xml-to-csv-in-python/

CodePudding user response:

You probably don't need to go through ElementTree; you can feed the xml directly to pandas. If I understand you correctly, this should do it:

df = pd.read_xml(path_to_file,"//*[local-name()='MainVIP']")
df = df.iloc[:,:4]
df

Output from your xml above:

    Date    RegisteredDate  Type    TypeDescription
0   20210616    20210216    YMBA    TYPE OF ENQUIRY

CodePudding user response:

Without any external lib - the code below generates a csv file.
The idea is to collect the required elements data from MainVip and store it in list of dicts. Loop on the list and write the data into a file.

import xml.etree.ElementTree as ET

xml = ''' <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
            <Level3>
                <ResponseStatus>Success</ResponseStatus>
                <ErrorMessage/>
                <Message>20 alert(s) generated for this period</Message>
                <ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
                <Something1>1</Something1>
                <Something2/>
                <Something3/>
                <Something4/>
                <VIP>
                    <MainVIP>
                        <Date>20210616</Date>
                        <RegisteredDate>20210216</RegisteredDate>
                        <Type>YMBA</Type>
                        <TypeDescription>TYPE OF ENQUIRY</TypeDescription>
                        <BusinessName>COMPANY NAME</BusinessName>
                        <ITNumber>987654321</ITNumber>
                        <RegistrationNumber>123456789</RegistrationNumber>
                        <SubscriberNumber>55889977</SubscriberNumber>
                        <SubscriberReference/>
                        <TicketNumber>1122336655</TicketNumber>
                        <SubscriberName>COMPANY NAME 2 </SubscriberName>
                        <CompletedDate>20210615</CompletedDate>
                    </MainVIP>
                </VIP>
                <Something5/>
                <Something6/>
                <Something7/>
                <Something8/>
                <Something9/>
                <PrincipalSomething10/>
                <PrincipalSomething11/>
                <PrincipalSomething12/>
                <PrincipalSomething13/>
                <Something14/>
                <Something15/>
                <Something16/>
                <Something17/>
                <Something18/>
                <PrincipalSomething19/>
                <PrincipalSomething20/>
            </Level3>
        </Level2>
    </soap:Body>
</soap:Envelope>'''

cols = ['Date', 'RegisteredDate', 'Type',
        'TypeDescription']
rows = []
NS = '{https://xxxxxxxxxx/xxxxxxx}'
root = ET.fromstring(xml)
for vip in root.findall(f'.//{NS}MainVIP'):
    rows.append({c: vip.find(NS c).text for c in cols})
with open('out.csv','w') as f:
    f.write(','.join(cols)   '\n')
    for row in rows:
        f.write(','.join(row[c] for c in cols)   '\n')

out.csv

Date,RegisteredDate,Type,TypeDescription
20210616,20210216,YMBA,TYPE OF ENQUIRY
  • Related