Home > Blockchain >  Python - using element tree to get data from specific nodes in xml
Python - using element tree to get data from specific nodes in xml

Time:05-05

I have been looking around and there are a lot of similar questions, but none that solved my issue sadly.

My XML file looks like this

<?xml version="1.0" encoding="utf-8"?>
  <Nodes>
    <Node ComponentID="1">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="2">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="Off"/>
      </Settings>
    </Node>
    <Node ComponentID="3">
      <Settings>
        <Value name="Text Box (1)"> SettingG </Value>
        <Value name="Text Box (2)"> SettingH </Value>
        <Value name="Text Box (3)"> SettingI </Value>
        <Value name="Text Box (4)"> SettingJ </Value>
      <AdvSettings State="Yes"/>
      </Settings>
    </Node>
  </Nodes>

With Python I'm trying to get the Values of text box 1 and text box 2 for each Node that has "AdvSettings" set on ON.

So in this case I would like a result like

ComponentID  State  Textbox1  Textbox2
1            On     SettingA  SettingB
3            On     SettingG  SettingH

I have done some attempts but didn't get far. With this I managed to get the AdvSettings tag, but that's as far as I got:

import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch.xml')
root = tree.getroot()

for AdvSettingsin root.iter('AdvSettings'):
    print(AdvSettings.tag, AdvSettings.attrib)

CodePudding user response:

You can use an XPath to find all the relevant nodes and then extract the needed data out of them. An example to this will be like below. (Comments as explanation)

from lxml import etree

xml = etree.fromstring('''
  <Nodes>...
  </Nodes>
''')

# Use XPath to select the relevant nodes

on_nodes = xml.xpath("//Node[Settings[AdvSettings[@State='Yes' or @State='On']]]")

# Get all needed information from every node
data_collected = [dict(
    [("ComponentID", node.attrib['ComponentID'])]  
    [(c.get("name"), c.text) for c in node.find("Settings").getchildren() if c.text]) for node in on_nodes]


# You got a list of dicts with all relevant information
# print it out, I used pandas for formatting. Optional
import pandas
print(pandas.DataFrame.from_records(data_collected).to_markdown(index=False))

Would give you an output like

|   ComponentID | Text Box (1)   | Text Box (2)   | Text Box (3)   | Text Box (4)   |
|--------------:|:---------------|:---------------|:---------------|:---------------|
|             1 | SettingA       | SettingB       | SettingC       | SettingD       |
|             3 | SettingG       | SettingH       | SettingI       | SettingJ       |

CodePudding user response:

Below (using python core xml lib)

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="utf-8"?>
  <Nodes>
    <Node ComponentID="1">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="2">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="Off"/>
      </Settings>
    </Node>
    <Node ComponentID="3">
      <Settings>
        <Value name="Text Box (1)"> SettingG </Value>
        <Value name="Text Box (2)"> SettingH </Value>
        <Value name="Text Box (3)"> SettingI </Value>
        <Value name="Text Box (4)"> SettingJ </Value>
      <AdvSettings State="Yes"/>
      </Settings>
    </Node>
  </Nodes>''' 

data = []
root = ET.fromstring(xml)
nodes = root.findall('.//Node')
for node in nodes:
  adv = node.find('.//AdvSettings')
  if adv is None:
    continue
  flag = adv.attrib.get('State','Off')
  if flag == 'On' or  flag == 'Yes':
    data.append({'id':node.attrib.get('ComponentID'),'txt_box_1':node.find('.//Value[@name="Text Box (1)"]').text.strip(),'txt_box_2':node.find('.//Value[@name="Text Box (2)"]').text.strip()})

df = pd.DataFrame(data)
print(df)

output

  id txt_box_1 txt_box_2
0  1  SettingA  SettingB
1  3  SettingG  SettingH
  • Related