While its a very common question at first, I have tried many different approach to scrap all the text recursively from the following html code, but for some reason none of them worked:
<span >
<span ><b>20</b>%</span>
<span >Cupom</span>
</span>
What I tried :
p.css('span.coupon__logo coupon__logo--for-shops *::text').get()
p.css('span.amount ::text').get()
p.css('span.amount *::text').get()
And even a xpath one:
p.xpath('//span[@]//text()').get()
p.xpath('//span[@]//text()').get()
The best thing I got was p.css('span.amount *::text').getall()
, but it will extract the text from all of the concurrences, what requires me to create a code to organize them individually, while is way better if i could get only the text of the current instance, especially because I'm looping trough many of them, and because it would be vulnerable to any changes from the website .
CodePudding user response:
instead of getting all the text of all the children of <span >
you can get the text of specific children.
CSS:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span *::text').getall())
Out[1]: '20 % Cupom'
xpath:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.xpath('//span[@]/span//text()').getall())
Out[1]: '20 % Cupom'
If you have more span
tags and you only want amount
and type
you can use this:
CSS:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span.amount *::text, span.type::text').getall())
Out[1]: '20 % Cupom'
xpath:
scrapy shell file:///path/to/file.html
In [1]: ' '.join(response.xpath('//span[@]/span[@ or @]//text()').getall())
Out[1]: '20 % Cupom'