Parsing data from XML into Excel-CodePudding

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Report xsi:schemaLocation="test.xsd" xmlns="test2" xmlns:xsi="test3">
    <Ver>55</Ver>
    <data>
        <title>abcd'</title>
        <date>2011-11-12</date>
        <ID>Asdf123</ID>
    </data>
</Report>

Im trying to parse the "title" value "abcd" into first column of my excel file and the "date" value into second column etc. How can I do that? I tried the code:

import xml.etree.cElementTree as et
import pandas as pd

tree=et.parse('test.xml')
root=tree.getroot()

Title= []
Date= []

for title in root.iter('title'):
    Title.append(title.text)

for date in root.iter('date'):
    Date.append(date.text)

print(Title)
print(Date)

Jobs_df = pd.DataFrame(
                       list(zip(Title,Date)), 
                       columns=['Title', 'Date'])
                       
                                       
Jobs_df.to_csv("result.csv")

although when I print the array it is empty. I tried to do different things with findall() function etc.

I tried to do following aswell:

print(root[1][0].text)

and it prints me the title succesfully, but I dont want to do it this way, since in the future I will get multiple titles and dates and I want to iterate through them all (they have the same name but different IDs) and not only the specific ones

CodePudding user response：

array it is empty

You are using XML with namespace therefore you need to look for {test2}title and {test2}date rather than title and date, that is replace

for title in root.iter('title'):
    Title.append(title.text)

for date in root.iter('date'):
    Date.append(date.text)

using

for title in root.iter('{test2}title'):
    Title.append(title.text)

for date in root.iter('{test2}date'):
    Date.append(date.text)

Read Parsing XML with Namespaces if you want to know more regarding processing XML sporting such feature.

CodePudding user response：

You can do like this using pandas:

import pandas as pd
df = pd.read_xml('test.xml')
df = df.iloc[:,1:3]
df
df.to_csv('test.csv')