<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Report xsi:schemaLocation="test.xsd" xmlns="test2" xmlns:xsi="test3">
<Ver>55</Ver>
<data>
<title>abcd'</title>
<date>2011-11-12</date>
<ID>Asdf123</ID>
</data>
</Report>
Im trying to parse the "title" value "abcd" into first column of my excel file and the "date" value into second column etc. How can I do that? I tried the code:
import xml.etree.cElementTree as et
import pandas as pd
tree=et.parse('test.xml')
root=tree.getroot()
Title= []
Date= []
for title in root.iter('title'):
Title.append(title.text)
for date in root.iter('date'):
Date.append(date.text)
print(Title)
print(Date)
Jobs_df = pd.DataFrame(
list(zip(Title,Date)),
columns=['Title', 'Date'])
Jobs_df.to_csv("result.csv")
although when I print the array it is empty. I tried to do different things with findall() function etc.
I tried to do following aswell:
print(root[1][0].text)
and it prints me the title succesfully, but I dont want to do it this way, since in the future I will get multiple titles and dates and I want to iterate through them all (they have the same name but different IDs) and not only the specific ones
CodePudding user response:
array it is empty
You are using XML with namespace therefore you need to look for {test2}title
and {test2}date
rather than title
and date
, that is replace
for title in root.iter('title'):
Title.append(title.text)
for date in root.iter('date'):
Date.append(date.text)
using
for title in root.iter('{test2}title'):
Title.append(title.text)
for date in root.iter('{test2}date'):
Date.append(date.text)
Read Parsing XML with Namespaces if you want to know more regarding processing XML sporting such feature.
CodePudding user response:
You can do like this using pandas:
import pandas as pd
df = pd.read_xml('test.xml')
df = df.iloc[:,1:3]
df
df.to_csv('test.csv')