I am trying to find the text in a few elements in an XML document in python. Here is a snippet of the XML Document followed by my code:
<root>
<doc>
<field name="id">metadata_9606_SAMN03465421</field>
<field name="is_metadata">true</field>
<field name="is_sample">true</field>
<field name="project_desc">PRJNA280600</field>
<field name="taxid">9606</field>
<field name="source_name">uterus</field>
<field name="sample_id">SAMN03465421</field>
<field name="exp_Mcount">13341.1</field>
</doc>
<doc>
<field name="id">1_SAMN03465421</field>
<field name="gene">1</field>
<field name="sample_id">SAMN03465421</field>
<field name="source_name">uterus</field><field name="var">0</field>
<field name="full_rpkm">0.133911</field>
<field name="exp_rpkm">0.134</field>
<field name="exp_total">3155</field>
<field name="project_desc">PRJNA280600</field>
</doc>
<doc>
<field name="id">1_SAMN03465420</field>
<field name="gene">1</field>
<field name="sample_id">SAMN03465420</field>
<field name="source_name">trachea</field><field name="var">0</field>
<field name="full_rpkm">0.0232912</field>
<field name="exp_rpkm">0.0233</field>
<field name="exp_total">604</field>
<field name="project_desc">PRJNA280600</field>
</doc>
</root>
Here is my code:
import lxml.etree
tree = lxml.etree.parse(<PATH TO DOCUMENT>)
root = tree.getroot()
print(root.findall('/doc/field name[4]'))
I would like to print the fourth "field name" in each "doc" element that contains the XPATH I am looking for. I am getting this error when I run this code though:
Traceback (most recent call last):
File "/home/alex/PycharmProjects/gene_expression_ftp/main.py", line 4, in <module>
print(root.findall('/doc/field name[4]'))
File "src/lxml/etree.pyx", line 1575, in lxml.etree._Element.findall
File "src/lxml/_elementpath.py", line 334, in lxml._elementpath.findall
File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
File "src/lxml/_elementpath.py", line 281, in lxml._elementpath._build_path_iterator
SyntaxError: cannot use absolute path on element
CodePudding user response:
I found a solution using an interesting library called xmltodict: Finding element in xml with python . These are the Docs: https://xmltodict.readthedocs.io/en/stable/README/ .
Here is my code:
with open(<PATH TO FILE>, 'r') as gene_exps:
data = xmltodict.parse(gene_exps.read())
for i in data['root']['doc']:
for item in i.get('field'):
if item.get('@name') == 'source_name':
print(item.get('#text'))
CodePudding user response:
try this
def search():
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
for doc in root.findall("doc"):
for item in doc:
print(item.attrib['name'])