I am learning my way around python and right now I need a little bit of help. I have an XML file from soap api that I am failing at converting to CSV. I managed to get the data with the request library easily. My struggle is converting it to CSV, I end up with headers with no values
My XML Data :
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
<Level3>
<ResponseStatus>Success</ResponseStatus>
<ErrorMessage/>
<Message>20 alert(s) generated for this period</Message>
<ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
<Something1>1</Something1>
<Something2/>
<Something3/>
<Something4/>
<VIP>
<MainVIP>
<Date>20210616</Date>
<RegisteredDate>20210216</RegisteredDate>
<Type>YMBA</Type>
<TypeDescription>TYPE OF ENQUIRY</TypeDescription>
<BusinessName>COMPANY NAME</BusinessName>
<ITNumber>987654321</ITNumber>
<RegistrationNumber>123456789</RegistrationNumber>
<SubscriberNumber>55889977</SubscriberNumber>
<SubscriberReference/>
<TicketNumber>1122336655</TicketNumber>
<SubscriberName>COMPANY NAME 2 </SubscriberName>
<CompletedDate>20210615</CompletedDate>
</MainVIP>
</VIP>
<Something5/>
<Something6/>
<Something7/>
<Something8/>
<Something9/>
<PrincipalSomething10/>
<PrincipalSomething11/>
<PrincipalSomething12/>
<PrincipalSomething13/>
<Something14/>
<Something15/>
<Something16/>
<Something17/>
<Something18/>
<PrincipalSomething19/>
<PrincipalSomething20/>
</Level3>
</Level2>
</soap:Body>
</soap:Envelope>
My python code looks like this :
import xml.etree.ElementTree as ET
import pandas as pd
cols = ['Date', 'RegisteredDate', 'Type',
'TypeDescription']
rows = []
# parse xml file
xmlparse = ET.parse('xmldata.xml')
root = xmlparse.getroot()
for i in root:
Date = i.get('Date').text
RegisteredDate = i.get('RegisteredDate').text
Type = i.get('Type').text
TypeDescription = i.get('TypeDescription').text
rows.append({'Date': Date,
'RegisteredDate': RegisteredDate,
'Type': Type,
'TypeDescription': TypeDescription})
df = pd.DataFrame(rows, columns=cols)
print(df)
df.to_csv('csvdata.csv')
In my approach, I was following the idea from here https://www.geeksforgeeks.org/convert-xml-to-csv-in-python/
CodePudding user response:
You probably don't need to go through ElementTree; you can feed the xml directly to pandas. If I understand you correctly, this should do it:
df = pd.read_xml(path_to_file,"//*[local-name()='MainVIP']")
df = df.iloc[:,:4]
df
Output from your xml above:
Date RegisteredDate Type TypeDescription
0 20210616 20210216 YMBA TYPE OF ENQUIRY
CodePudding user response:
Without any external lib - the code below generates a csv file.
The idea is to collect the required elements data from MainVip
and store it in list of dicts. Loop on the list and write the data into a file.
import xml.etree.ElementTree as ET
xml = ''' <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
<Level3>
<ResponseStatus>Success</ResponseStatus>
<ErrorMessage/>
<Message>20 alert(s) generated for this period</Message>
<ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
<Something1>1</Something1>
<Something2/>
<Something3/>
<Something4/>
<VIP>
<MainVIP>
<Date>20210616</Date>
<RegisteredDate>20210216</RegisteredDate>
<Type>YMBA</Type>
<TypeDescription>TYPE OF ENQUIRY</TypeDescription>
<BusinessName>COMPANY NAME</BusinessName>
<ITNumber>987654321</ITNumber>
<RegistrationNumber>123456789</RegistrationNumber>
<SubscriberNumber>55889977</SubscriberNumber>
<SubscriberReference/>
<TicketNumber>1122336655</TicketNumber>
<SubscriberName>COMPANY NAME 2 </SubscriberName>
<CompletedDate>20210615</CompletedDate>
</MainVIP>
</VIP>
<Something5/>
<Something6/>
<Something7/>
<Something8/>
<Something9/>
<PrincipalSomething10/>
<PrincipalSomething11/>
<PrincipalSomething12/>
<PrincipalSomething13/>
<Something14/>
<Something15/>
<Something16/>
<Something17/>
<Something18/>
<PrincipalSomething19/>
<PrincipalSomething20/>
</Level3>
</Level2>
</soap:Body>
</soap:Envelope>'''
cols = ['Date', 'RegisteredDate', 'Type',
'TypeDescription']
rows = []
NS = '{https://xxxxxxxxxx/xxxxxxx}'
root = ET.fromstring(xml)
for vip in root.findall(f'.//{NS}MainVIP'):
rows.append({c: vip.find(NS c).text for c in cols})
with open('out.csv','w') as f:
f.write(','.join(cols) '\n')
for row in rows:
f.write(','.join(row[c] for c in cols) '\n')
out.csv
Date,RegisteredDate,Type,TypeDescription
20210616,20210216,YMBA,TYPE OF ENQUIRY