I'm currently having difficulty parsing an XML.
the data of my Output.xml
is
<?xml version="1.0" encoding="utf-8"?>
<file>
<ALL_INSTANCES>
<instance>
<ID>1</ID>
<start>0</start>
<end>17.96</end>
<code>14. Jordan Brian Henderson</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>52.4</pos_x>
<pos_y>34.0</pos_y>
</instance>
<instance>
<ID>8</ID>
<start>10.28</start>
<end>30.28</end>
<code>26. Andrew Robertson</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>61.7</pos_x>
<pos_y>68.0</pos_y>
</instance>
<instance>
<ID>1321</ID>
<start>3770.67</start>
<end>3790.67</end>
<code>3. Fabinho</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>62.7</pos_x>
<pos_y>3.7</pos_y>
</instance>
<instance>
<ID>1882</ID>
<start>5695.17</start>
<end>5715.17</end>
<code>2. Fabio Cardoso</code>
<label>
<group>Team</group>
<text>Porto</text>
</label>
<label>
<group>Action</group>
<text>Interceptions</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>8.1</pos_x>
<pos_y>46.3</pos_y>
</instance>
</ALL_INSTANCES>
</file>
the code I am running is
import xml.etree.ElementTree as Xet
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('Output.xml')
root = xmlparse.getroot()
for i in root:
ID = i.find("ID").text
Start = i.find("start").text
End = i.find("end").text
Player= i.find("code").text
Team = i.findall("./label[1]/text")[0].text
Action = i.findall("./label[2]/text")[0].text
Half = i.findall("./label[3]/text")[0].text
x = i.find("pos_x").text
y = i.find("pos_y").text
rows.append({"ID": ID,
"Start": Start,
"End": End,
"Player": Player,
"Team": Team,
"Action": Action,
"Half": Half,
"x": x,
"y": y})
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('output.csv')
but I'm getting the error of
AttributeError: 'NoneType' object has no attribute 'text'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-256-f9c125ff9538> in <module>
7 root = xmlparse.getroot()
8 for i in root:
----> 9 ID = i.find("ID").text
10 Start = i.find("start").text
11 End = i.find("end").text
AttributeError: 'NoneType' object has no attribute 'text'AttributeError: 'NoneType' object has no attribute 'text'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-256-f9c125ff9538> in <module>
7 root = xmlparse.getroot()
8 for i in root:
----> 9 ID = i.find("ID").text
10 Start = i.find("start").text
11 End = i.find("end").text
AttributeError: 'NoneType' object has no attribute 'text'
I understand the error, but I don't understand why I'm getting the error.
CodePudding user response:
Try and take it from here:
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0" encoding="utf-8"?>
<file>
<ALL_INSTANCES>
<instance>
<ID>1</ID>
<start>0</start>
<end>17.96</end>
<code>14. Jordan Brian Henderson</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>52.4</pos_x>
<pos_y>34.0</pos_y>
</instance>
<instance>
<ID>8</ID>
<start>10.28</start>
<end>30.28</end>
<code>26. Andrew Robertson</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>61.7</pos_x>
<pos_y>68.0</pos_y>
</instance>
<instance>
<ID>1321</ID>
<start>3770.67</start>
<end>3790.67</end>
<code>3. Fabinho</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>62.7</pos_x>
<pos_y>3.7</pos_y>
</instance>
<instance>
<ID>1882</ID>
<start>5695.17</start>
<end>5715.17</end>
<code>2. Fabio Cardoso</code>
<label>
<group>Team</group>
<text>Porto</text>
</label>
<label>
<group>Action</group>
<text>Interceptions</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>8.1</pos_x>
<pos_y>46.3</pos_y>
</instance>
</ALL_INSTANCES>
</file>'''
fields = ['ID','start','end','code','pos_x','pos_y']
root = ET.fromstring(xml)
data = []
for inst in root.findall('.//instance'):
data.append({f:inst.find(f).text for f in fields})
df = pd.DataFrame(data)
print(df)
output
ID start end code pos_x pos_y
0 1 0 17.96 14. Jordan Brian Henderson 52.4 34.0
1 8 10.28 30.28 26. Andrew Robertson 61.7 68.0
2 1321 3770.67 3790.67 3. Fabinho 62.7 3.7
3 1882 5695.17 5715.17 2. Fabio Cardoso 8.1 46.3
CodePudding user response:
Seems all children of instance
elements are needed so they can be handled sequentially according to the xml sample.
Using XPath to find all instance
elements and iterating over them
from lxml import etree
tree = etree.parse('/home/lmc/tmp/tmp.xml')
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
xcols = ["ID", "start", "end", "code", "label", "label","label","pos_x","pos_y"]
root = tree.getroot()
steps = tree.xpath('//instance')
for s in steps:
row= dict()
x=0
for i in s:
if xcols[x] != 'label':
row[cols[x]] = i.text
else:
t = i.find('./text')
row[cols[x]] = t.text
x =1
print(row)
Result:
{'ID': '1', 'Start': '0', 'End': '17.96', 'Player': '14. Jordan Brian Henderson', 'Team': 'Liverpool FC', 'Action': 'Passes accurate', 'Half': '1st half', 'x': '52.4', 'y': '34.0'}
{'ID': '8', 'Start': '10.28', 'End': '30.28', 'Player': '26. Andrew Robertson', 'Team': 'Liverpool FC', 'Action': 'Passes accurate', 'Half': '1st half', 'x': '61.7', 'y': '68.0'}
{'ID': '1321', 'Start': '3770.67', 'End': '3790.67', 'Player': '3. Fabinho', 'Team': 'Liverpool FC', 'Action': 'Passes accurate', 'Half': '2nd half', 'x': '62.7', 'y': '3.7'}
{'ID': '1882', 'Start': '5695.17', 'End': '5715.17', 'Player': '2. Fabio Cardoso', 'Team': 'Porto', 'Action': 'Interceptions', 'Half': '2nd half', 'x': '8.1', 'y': '46.3'}