I have an HTML table:
<div >
<div >property 1</div>
<div >value</div>
</div>
<div >
<div >property 2</div>
<div >value</div>
</div>
<div >
<div >property 3</div>
<div >value</div>
</div>
<div >
<div >property 4</div>
<div >value</div>
</div>
I need to catch/get the property 4 value...
for item in response.css('div.parameters'):
name = item.xpath('//div[text()[contains(.,"property 4")]]/following::div[1]/text()').get()
but it doesn't work, where is the error?
CodePudding user response:
Try:
from lxml import etree as ET
xml_doc = """
<root>
<div >
<div >property 1</div>
<div >value 1</div>
</div>
<div >
<div >property 2</div>
<div >value 2</div>
</div>
<div >
<div >property 3</div>
<div >value 3</div>
</div>
<div >
<div >property 4</div>
<div >value 4</div>
</div>
</root>
"""
parsed = ET.fromstring(xml_doc)
properties = parsed.xpath('//div[contains(@class, "property")]')
values = parsed.xpath('//div[contains(@class, "value")]')
out = {p.text: v.text for p, v in zip(properties, values)}
print(out["property 4"])
Prints:
value 4
CodePudding user response:
//div[contains(.,"property 4")]/./div//text()
The above xpath expression will go one level up and from that level will select all the following divs meaning output is property 4 value
Final xpath expression:
' '.join(response.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Proven by scrapy shell:
In [1]: from scrapy.selector import Selector
In [2]: %paste
html ='''
<div >
<div >property 1</div>
<div >value 1</div>
</div>
<div >
<div >property 2</div>
<div >value 2</div>
</div>
<div >
<div >property 3</div>
<div >value 3</div>
</div>
<div >
<div >property 4</div>
<div >value</div>
</div>
'''
## -- End pasted text --
In [3]: sel = Selector(text=html)
In [4]:
...: ' '.join(sel.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Out[4]: 'property 4 value'