I am trying to get some text out of a table from an online xml file. I can find the tables:
from lxml import etree
import requests
main_file = requests.get('https://training.gov.au/TrainingComponentFiles/CUA/CUAWRT601_R1.xml')
main_file.encoding = 'utf-8-sig'
root = etree.fromstring(main_file.content)
tables = root.xpath('//foo:table', namespaces={"foo": "http://www.authorit.com/xml/authorit"})
print(tables)
But I can't get any further than that. The text that I am looking for is:
- Prepare to write scripts
- Write draft scripts
- Produce final scripts
When I paste the xml in here: http://xpather.com/
I can get it using the following expression:
//table[1]/tr/td[@width="2700"]/p[@id="4"][not(*)]/text()
but that doesn't work here and I'm out of ideas. How can I get that text?
CodePudding user response:
Use the namespace prefix you declared (with namespaces={"foo": "http://www.authorit.com/xml/authorit"}
) e.g. instead of //table[1]/tr/td[@width="2700"]/p[@id="4"][not(*)]/text()
use //foo:table[1]/foo:tr/foo:td[@width="2700"]/foo:p[@id="4"][not(*)]/text()
.