I am trying to get the title and description of the specific product written in greek language i want the output in greek aswell. but it is not showing full title . why?
here is my scrapy code.
import scrapy
class FitcoupleSpider(scrapy.Spider):
name = 'fitcouple'
allowed_domains = ['fitcouple360.gr']
start_urls = ['https://fitcouple360.gr/product/herbalife-f1-threptika-meal-bars-sokolata/']
def parse(self, response):
products = response.xpath(".//div[@class='content-page container']/div[contains(@class,'single-product product')]")
for product in products:
title = product.xpath("//div//div[@class='fixed-content']/h1/text()").get()
print(title)
the product title is "Θρεπτικό πρωτεϊνούχο ρόφημα Herbalife Formula 1 υγιεινό γεύμα Café Latte 550g"
but i am getting output "Herbalife F1 Θρεπτικά Meal Bars σοκολάτα"
2022-09-27 22:49:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fitcouple360.gr/product/herbalife-f1-threptika-meal-bars-sokolata/> (referer: None)
**Herbalife F1 Θρεπτικά Meal Bars σοκολάτα**
2022-09-27 22:49:56 [scrapy.core.engine] INFO: Closing spider (finished)
2022-09-27 22:49:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
CodePudding user response:
The above url contains the product title:Herbalife F1 Θρεπτικά Meal Bars σοκολάτα which is correct. If not then You have injected the another link please justify the title with link whether it is correct or not
CodePudding user response:
the part of html you get by this xpath("//div//div[@class='fixed-content']/h1/text()")
doesn't contain this Θρεπτικό πρωτεϊνούχο ρόφημα Herbalife Formula 1 υγιεινό γεύμα Café Latte 550g
but instead it contains this Herbalife F1 Θρεπτικά Meal Bars σοκολάτα
the problem is not with title or greek language encoding
the problem is either wrong xpath or js that dynamically changes data and makes you get wrong product