Home > Net >  Python scraping hebrew - how to convert string "סיגר" to normal
Python scraping hebrew - how to convert string "סיגר" to normal

Time:08-30

So I'm scraping a website to get its' data. It's a woocomerce website and a product has multiple variations with different prices.

scraping with BeautifulSoup I'm able to get the whole product and variant information but some strings are unreadable.

specific product page: https://dogo.co.il/product/כנען-חטיפים-לכלבים-במגוון-טעמים-60-100-גרם/

product_page = requests.get(single_product_url)
product_soup = BeautifulSoup(product_page.content, "html.parser")

product_form = product_soup.find("form", {"class": "variations_form cart"})
variations_json = json.loads(product_form["data-product_variations"])
attributes = item["attributes"]
variant_title = attributes["attribute_pa_flavor"]
print(variant_title)

the output is: "סיגר-עוף-100-גרם"

The JSON I get has all variant information such as 'is_in_stock', prices, and discounts for each variant.

I don't need only variant titles - I need the whole variant data.

How do I convert "סיגר-עוף-100-גרם" to a normal string?

I tried encoding and decoding - no success.

Thanks!

CodePudding user response:

You can do with urllib, I used python.3x

In [9]: import urllib

In [10]: urllib.parse.unquote(
    ...:     "סיגר-עוף-100-גרם"
    ...: )
Out[10]: 'סיגר-עוף-100-גרם'
  • Related