So I'm scraping a website to get its' data. It's a woocomerce website and a product has multiple variations with different prices.
scraping with BeautifulSoup I'm able to get the whole product and variant information but some strings are unreadable.
specific product page: https://dogo.co.il/product/כנען-חטיפים-לכלבים-במגוון-טעמים-60-100-גרם/
product_page = requests.get(single_product_url)
product_soup = BeautifulSoup(product_page.content, "html.parser")
product_form = product_soup.find("form", {"class": "variations_form cart"})
variations_json = json.loads(product_form["data-product_variations"])
attributes = item["attributes"]
variant_title = attributes["attribute_pa_flavor"]
print(variant_title)
the output is: "סיגר-עוף-100-גרם"
The JSON I get has all variant information such as 'is_in_stock', prices, and discounts for each variant.
I don't need only variant titles - I need the whole variant data.
How do I convert "סיגר-עוף-100-גרם"
to a normal string?
I tried encoding and decoding - no success.
Thanks!
CodePudding user response:
You can do with urllib
, I used python.3x
In [9]: import urllib
In [10]: urllib.parse.unquote(
...: "סיגר-עוף-100-גרם"
...: )
Out[10]: 'סיגר-עוף-100-גרם'