What would be the correct syntax to this :
//footer//a | (//a[not(//footer)] and position() <=200)
Use only //footer if exists, if not, find all //a that are not in //footer and limit this to 200
CodePudding user response:
You were really close. The OR operator already handles your case - if footer contains no <a>
nodes underneath it then second OR statement will be captured:
Using python
and parsel
(scrapy's html parser).
>>> foo = Selector("<footer><a>text</a></footer>")
>>> bar = Selector("<div><a>text</a><a>text2</a><a>text3</a><a>text4</a></div>")
>>> foo.xpath("//footer//a | //a[position()<=2]").get()
'<a>text</a>'
>>> bar.xpath("//footer//a | //a[position()<=2]").extract()
['<a>text</a>', '<a>text2</a>']
Note: I used 2
instead of 200
for brevity.