Home > Software design >  Scraping listed HTML values using scrapy
Scraping listed HTML values using scrapy

Time:03-01

I can't seem to figure out how to construct this xpath selector. I have even tried using nextsibling::text but to no avail. I have also browsed stackoverflow questions for scraping listed values but could not implement it correctly. I keep getting blank results. Any and all help would be appreciated. Thank you.

The website is enter image description here

Previous 2 selectors:

list_span = response.xpath(".//span[contains(@class,'value-chars')]//text()").extract()

list_a = response.xpath(".//a[contains(@class,'value-chars')]//text()").extract()

CodePudding user response:

You need two selectors, one will pass keys and another one will parse values. This will result in two lists that can be zipped together in order to give you the results you are looking for.

CSS Selectors could be like:

Keys Selector --> .chars-column li .key-chars Values Selector --> .chars-column li .value-chars

Once you extract both lists, you can zip them and consume them as key value.

CodePudding user response:

Use this XPath to get Wood

//*[@]//span[2]//text()

Use this XPath to get 2015

//*[@]//a[text()="2015"]

CodePudding user response:

I suppose this is because of invalid HTML (some span-elements are not closed) normal xpath's are not possible.

This did gave me results:

".//*[contains(@class,'value-chars')]"
  • Related