Home > database >  How do I find the element in this XML document?
How do I find the element in this XML document?

Time:07-17

I am trying to find the text in a few elements in an XML document in python. Here is a snippet of the XML Document followed by my code:

<root>
   <doc>
      <field name="id">metadata_9606_SAMN03465421</field>
      <field name="is_metadata">true</field>
      <field name="is_sample">true</field>
      <field name="project_desc">PRJNA280600</field>
      <field name="taxid">9606</field>
      <field name="source_name">uterus</field>
      <field name="sample_id">SAMN03465421</field>
      <field name="exp_Mcount">13341.1</field>
   </doc>
   <doc>
      <field name="id">1_SAMN03465421</field>
      <field name="gene">1</field>
      <field name="sample_id">SAMN03465421</field>
<field name="source_name">uterus</field><field name="var">0</field>
      <field name="full_rpkm">0.133911</field>
      <field name="exp_rpkm">0.134</field>
      <field name="exp_total">3155</field>
      <field name="project_desc">PRJNA280600</field>
   </doc>
   <doc>
      <field name="id">1_SAMN03465420</field>
      <field name="gene">1</field>
      <field name="sample_id">SAMN03465420</field>
<field name="source_name">trachea</field><field name="var">0</field>
      <field name="full_rpkm">0.0232912</field>
      <field name="exp_rpkm">0.0233</field>
      <field name="exp_total">604</field>
      <field name="project_desc">PRJNA280600</field>
   </doc>
</root>

Here is my code:

import lxml.etree
tree = lxml.etree.parse(<PATH TO DOCUMENT>)
root = tree.getroot()
print(root.findall('/doc/field name[4]'))

I would like to print the fourth "field name" in each "doc" element that contains the XPATH I am looking for. I am getting this error when I run this code though:

Traceback (most recent call last):
  File "/home/alex/PycharmProjects/gene_expression_ftp/main.py", line 4, in <module>
    print(root.findall('/doc/field name[4]'))
  File "src/lxml/etree.pyx", line 1575, in lxml.etree._Element.findall
  File "src/lxml/_elementpath.py", line 334, in lxml._elementpath.findall
  File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
  File "src/lxml/_elementpath.py", line 281, in lxml._elementpath._build_path_iterator
SyntaxError: cannot use absolute path on element

CodePudding user response:

I found a solution using an interesting library called xmltodict: Finding element in xml with python . These are the Docs: https://xmltodict.readthedocs.io/en/stable/README/ .

Here is my code:

with open(<PATH TO FILE>, 'r') as gene_exps:
    data = xmltodict.parse(gene_exps.read())

for i in data['root']['doc']:
    for item in i.get('field'):
        if item.get('@name') == 'source_name':
            print(item.get('#text'))

CodePudding user response:

try this

def search():
    import xml.etree.ElementTree as ET

    tree = ET.parse('file.xml')

    root = tree.getroot()
   
    for doc in root.findall("doc"):
       for item in doc:
           print(item.attrib['name'])
  • Related