how to get a value that has exactly the name of the class web scraping-CodePudding

I have this web page:

this code has the sizes in it. you can see that some class has: js-variant and some has js-variant disabled

what i want is to select the text where the class is js-variant using web scraping.

I tried doing:  response.css('a.js-variant ::text').extract()

but this does not work properly and gives all the values even the ones that has a class disabled in it.

how can I do that?

CodePudding user response：

There is a space in class like this js-variant and we can use xpath to get exact matches.

items = response.xpath("//a[@class='js-variant ']")

for item in items:
  item_text = item.css("::text").get().strip()

CodePudding user response：

There are two types of sizes which are active and disabled and they are to be diffrentiated by two class attributes js-variant and js-variant disabled

when you use js-variant then it's found in the both sizes. So the better choice here is xpath expression BUT the following xpath expression selects only the active sizes:

(//*[@])[2]/li//*[@]

As you are using response, assuming that you want to implement it with scrapy.

Example:

import scrapy

class TestSpider(scrapy.Spider):
    name = 'test'  
    def start_requests(self):
         yield scrapy.Request ('https://tr.uspoloassn.com/erkek-beyaz-polo-yaka-t-shirt-basic-50249146-vr013/',
         callback = self.parse
         )

    def parse(self, response):
        
        yield {'size':[x.get().strip() for x in response.xpath('(//*[@])[2]/li//*[@]/text()')]}

Output:

{'size': ['2XS', 'XS', 'S', 'M']}