Home > Software design >  Getting text of children tags with CSS selector with Scrapy returns nothing
Getting text of children tags with CSS selector with Scrapy returns nothing

Time:12-05

While its a very common question at first, I have tried many different approach to scrap all the text recursively from the following html code, but for some reason none of them worked:

<span >




      <span ><b>20</b>%</span>

      <span >Cupom</span>



</span>

What I tried :

p.css('span.coupon__logo coupon__logo--for-shops *::text').get()

p.css('span.amount ::text').get()

p.css('span.amount *::text').get()

And even a xpath one:

p.xpath('//span[@]//text()').get()
p.xpath('//span[@]//text()').get()

The best thing I got was p.css('span.amount *::text').getall(), but it will extract the text from all of the concurrences, what requires me to create a code to organize them individually, while is way better if i could get only the text of the current instance, especially because I'm looping trough many of them, and because it would be vulnerable to any changes from the website .

CodePudding user response:

instead of getting all the text of all the children of <span > you can get the text of specific children.

CSS:

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span *::text').getall())
Out[1]: '20 % Cupom'

xpath:

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.xpath('//span[@]/span//text()').getall())
Out[1]: '20 % Cupom'

If you have more span tags and you only want amount and type you can use this:

CSS:

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span.amount *::text, span.type::text').getall())
Out[1]: '20 % Cupom'

xpath:

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.xpath('//span[@]/span[@ or @]//text()').getall())
Out[1]: '20 % Cupom'
  • Related