Home > Net >  Find the value using XPath
Find the value using XPath

Time:05-18

I have an HTML table:

<div >
    <div >property 1</div>
    <div >value</div>
</div>
<div >
    <div >property 2</div>
    <div >value</div>
</div>
<div >
    <div >property 3</div>
    <div >value</div>
</div>
<div >
    <div >property 4</div>
    <div >value</div>
</div>

I need to catch/get the property 4 value...

for item in response.css('div.parameters'):
    name = item.xpath('//div[text()[contains(.,"property 4")]]/following::div[1]/text()').get()

but it doesn't work, where is the error?

CodePudding user response:

Try:

from lxml import etree as ET

xml_doc = """
<root>

<div >
    <div >property 1</div>
    <div >value 1</div>
</div>
<div >
    <div >property 2</div>
    <div >value 2</div>
</div>
<div >
    <div >property 3</div>
    <div >value 3</div>
</div>
<div >
    <div >property 4</div>
    <div >value 4</div>
</div>

</root>
"""

parsed = ET.fromstring(xml_doc)

properties = parsed.xpath('//div[contains(@class, "property")]')
values = parsed.xpath('//div[contains(@class, "value")]')

out = {p.text: v.text for p, v in zip(properties, values)}
print(out["property 4"])

Prints:

value 4

CodePudding user response:

//div[contains(.,"property 4")]/./div//text()

The above xpath expression will go one level up and from that level will select all the following divs meaning output is property 4 value

Final xpath expression:

' '.join(response.xpath('//div[contains(.,"property 4")]/./div//text()').getall())

Proven by scrapy shell:

In [1]: from scrapy.selector import Selector

In [2]: %paste
html ='''
<div >
    <div >property 1</div>
    <div >value 1</div>
</div>
<div >
    <div >property 2</div>
    <div >value 2</div>
</div>
<div >
    <div >property 3</div>
    <div >value 3</div>
</div>
<div >
    <div >property 4</div>
    <div >value</div>
</div>
'''

## -- End pasted text --

In [3]: sel = Selector(text=html)

In [4]: 
   ...: ' '.join(sel.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Out[4]: 'property 4 value'
  • Related