Home > Software design >  Parsing data from XML into Excel
Parsing data from XML into Excel

Time:03-12

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Report xsi:schemaLocation="test.xsd" xmlns="test2" xmlns:xsi="test3">
    <Ver>55</Ver>
    <data>
        <title>abcd'</title>
        <date>2011-11-12</date>
        <ID>Asdf123</ID>
    </data>
</Report>

Im trying to parse the "title" value "abcd" into first column of my excel file and the "date" value into second column etc. How can I do that? I tried the code:

import xml.etree.cElementTree as et
import pandas as pd

tree=et.parse('test.xml')
root=tree.getroot()

Title= []
Date= []

for title in root.iter('title'):
    Title.append(title.text)

for date in root.iter('date'):
    Date.append(date.text)

print(Title)
print(Date)

Jobs_df = pd.DataFrame(
                       list(zip(Title,Date)), 
                       columns=['Title', 'Date'])
                       
                                       
Jobs_df.to_csv("result.csv")

although when I print the array it is empty. I tried to do different things with findall() function etc.

I tried to do following aswell:

print(root[1][0].text)

and it prints me the title succesfully, but I dont want to do it this way, since in the future I will get multiple titles and dates and I want to iterate through them all (they have the same name but different IDs) and not only the specific ones

CodePudding user response:

array it is empty

You are using XML with namespace therefore you need to look for {test2}title and {test2}date rather than title and date, that is replace

for title in root.iter('title'):
    Title.append(title.text)

for date in root.iter('date'):
    Date.append(date.text)

using

for title in root.iter('{test2}title'):
    Title.append(title.text)

for date in root.iter('{test2}date'):
    Date.append(date.text)

Read Parsing XML with Namespaces if you want to know more regarding processing XML sporting such feature.

CodePudding user response:

You can do like this using pandas:

import pandas as pd
df = pd.read_xml('test.xml')
df = df.iloc[:,1:3]
df
df.to_csv('test.csv')
  • Related