Home > Software design >  Parsing XML: AttributeError: 'NoneType' object has no attribute 'text'
Parsing XML: AttributeError: 'NoneType' object has no attribute 'text'

Time:10-04

I'm currently having difficulty parsing an XML.

the data of my Output.xml is

<?xml version="1.0" encoding="utf-8"?>
<file>
  <ALL_INSTANCES>
    <instance>
      <ID>1</ID>
      <start>0</start>
      <end>17.96</end>
      <code>14. Jordan Brian Henderson</code>
      <label>
        <group>Team</group>
        <text>Liverpool FC</text>
      </label>
      <label>
        <group>Action</group>
        <text>Passes accurate</text>
      </label>
      <label>
        <group>Half</group>
        <text>1st half</text>
      </label>
      <pos_x>52.4</pos_x>
      <pos_y>34.0</pos_y>
    </instance>
    <instance>
      <ID>8</ID>
      <start>10.28</start>
      <end>30.28</end>
      <code>26. Andrew Robertson</code>
      <label>
        <group>Team</group>
        <text>Liverpool FC</text>
      </label>
      <label>
        <group>Action</group>
        <text>Passes accurate</text>
      </label>
      <label>
        <group>Half</group>
        <text>1st half</text>
      </label>
      <pos_x>61.7</pos_x>
      <pos_y>68.0</pos_y>
    </instance>
    <instance>
      <ID>1321</ID>
      <start>3770.67</start>
      <end>3790.67</end>
      <code>3. Fabinho</code>
      <label>
        <group>Team</group>
        <text>Liverpool FC</text>
      </label>
      <label>
        <group>Action</group>
        <text>Passes accurate</text>
      </label>
      <label>
        <group>Half</group>
        <text>2nd half</text>
      </label>
      <pos_x>62.7</pos_x>
      <pos_y>3.7</pos_y>
    </instance>
    <instance>
      <ID>1882</ID>
      <start>5695.17</start>
      <end>5715.17</end>
      <code>2. Fabio Cardoso</code>
      <label>
        <group>Team</group>
        <text>Porto</text>
      </label>
      <label>
        <group>Action</group>
        <text>Interceptions</text>
      </label>
      <label>
        <group>Half</group>
        <text>2nd half</text>
      </label>
      <pos_x>8.1</pos_x>
      <pos_y>46.3</pos_y>
    </instance>
  </ALL_INSTANCES>
</file>

the code I am running is

import xml.etree.ElementTree as Xet
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('Output.xml')
root = xmlparse.getroot()
for i in root:
    ID = i.find("ID").text
    Start = i.find("start").text
    End = i.find("end").text
    Player= i.find("code").text
    Team = i.findall("./label[1]/text")[0].text
    Action = i.findall("./label[2]/text")[0].text
    Half = i.findall("./label[3]/text")[0].text
    x = i.find("pos_x").text
    y = i.find("pos_y").text
  
    rows.append({"ID": ID,
                 "Start": Start,
                 "End": End,
                 "Player": Player,
                 "Team": Team,
                 "Action": Action,
                 "Half": Half,
                 "x": x,
                 "y": y})
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('output.csv')

but I'm getting the error of

AttributeError: 'NoneType' object has no attribute 'text'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-256-f9c125ff9538> in <module>
      7 root = xmlparse.getroot()
      8 for i in root:
----> 9     ID = i.find("ID").text
     10     Start = i.find("start").text
     11     End = i.find("end").text

AttributeError: 'NoneType' object has no attribute 'text'AttributeError: 'NoneType' object has no attribute 'text'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-256-f9c125ff9538> in <module>
      7 root = xmlparse.getroot()
      8 for i in root:
----> 9     ID = i.find("ID").text
     10     Start = i.find("start").text
     11     End = i.find("end").text

AttributeError: 'NoneType' object has no attribute 'text'

I understand the error, but I don't understand why I'm getting the error.

CodePudding user response:

Try and take it from here:

import xml.etree.ElementTree as ET
import pandas as pd


xml = '''<?xml version="1.0" encoding="utf-8"?>
<file>
  <ALL_INSTANCES>
    <instance>
      <ID>1</ID>
      <start>0</start>
      <end>17.96</end>
      <code>14. Jordan Brian Henderson</code>
      <label>
        <group>Team</group>
        <text>Liverpool FC</text>
      </label>
      <label>
        <group>Action</group>
        <text>Passes accurate</text>
      </label>
      <label>
        <group>Half</group>
        <text>1st half</text>
      </label>
      <pos_x>52.4</pos_x>
      <pos_y>34.0</pos_y>
    </instance>
    <instance>
      <ID>8</ID>
      <start>10.28</start>
      <end>30.28</end>
      <code>26. Andrew Robertson</code>
      <label>
        <group>Team</group>
        <text>Liverpool FC</text>
      </label>
      <label>
        <group>Action</group>
        <text>Passes accurate</text>
      </label>
      <label>
        <group>Half</group>
        <text>1st half</text>
      </label>
      <pos_x>61.7</pos_x>
      <pos_y>68.0</pos_y>
    </instance>
    <instance>
      <ID>1321</ID>
      <start>3770.67</start>
      <end>3790.67</end>
      <code>3. Fabinho</code>
      <label>
        <group>Team</group>
        <text>Liverpool FC</text>
      </label>
      <label>
        <group>Action</group>
        <text>Passes accurate</text>
      </label>
      <label>
        <group>Half</group>
        <text>2nd half</text>
      </label>
      <pos_x>62.7</pos_x>
      <pos_y>3.7</pos_y>
    </instance>
    <instance>
      <ID>1882</ID>
      <start>5695.17</start>
      <end>5715.17</end>
      <code>2. Fabio Cardoso</code>
      <label>
        <group>Team</group>
        <text>Porto</text>
      </label>
      <label>
        <group>Action</group>
        <text>Interceptions</text>
      </label>
      <label>
        <group>Half</group>
        <text>2nd half</text>
      </label>
      <pos_x>8.1</pos_x>
      <pos_y>46.3</pos_y>
    </instance>
  </ALL_INSTANCES>
</file>'''

fields = ['ID','start','end','code','pos_x','pos_y']

root = ET.fromstring(xml)
data = []
for inst in root.findall('.//instance'):
    data.append({f:inst.find(f).text for f in fields})
df = pd.DataFrame(data)

print(df)

output

     ID    start      end                        code pos_x pos_y
0     1        0    17.96  14. Jordan Brian Henderson  52.4  34.0
1     8    10.28    30.28        26. Andrew Robertson  61.7  68.0
2  1321  3770.67  3790.67                  3. Fabinho  62.7   3.7
3  1882  5695.17  5715.17            2. Fabio Cardoso   8.1  46.3

CodePudding user response:

Seems all children of instance elements are needed so they can be handled sequentially according to the xml sample.

Using XPath to find all instance elements and iterating over them

from lxml import etree
tree = etree.parse('/home/lmc/tmp/tmp.xml')
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
xcols = ["ID", "start", "end", "code", "label", "label","label","pos_x","pos_y"]
root = tree.getroot()
steps = tree.xpath('//instance')
for s in steps:
    row= dict()
    x=0
    for i in s:
        if xcols[x] != 'label':
            row[cols[x]] = i.text
        else:
            t = i.find('./text')
            row[cols[x]] = t.text
        x =1
    print(row)

Result:

{'ID': '1', 'Start': '0', 'End': '17.96', 'Player': '14. Jordan Brian Henderson', 'Team': 'Liverpool FC', 'Action': 'Passes accurate', 'Half': '1st half', 'x': '52.4', 'y': '34.0'}
{'ID': '8', 'Start': '10.28', 'End': '30.28', 'Player': '26. Andrew Robertson', 'Team': 'Liverpool FC', 'Action': 'Passes accurate', 'Half': '1st half', 'x': '61.7', 'y': '68.0'}
{'ID': '1321', 'Start': '3770.67', 'End': '3790.67', 'Player': '3. Fabinho', 'Team': 'Liverpool FC', 'Action': 'Passes accurate', 'Half': '2nd half', 'x': '62.7', 'y': '3.7'}
{'ID': '1882', 'Start': '5695.17', 'End': '5715.17', 'Player': '2. Fabio Cardoso', 'Team': 'Porto', 'Action': 'Interceptions', 'Half': '2nd half', 'x': '8.1', 'y': '46.3'}
  • Related