Home > OS >  parsing prices with bs4
parsing prices with bs4

Time:10-20

I am having trouble parsing prices from mec site. I am able to parse the product names just fine, but can't parse the prices. It returns NONE and is not a callable object.

I tried to load the div locally and that worked (2nd Code block)

Is it possible website not allowing or I am doing something wrong here?

import requests
from bs4 import BeautifulSoup


re = requests.get("https://www.mec.ca/en/search?org_text=hiking boots men&text=hiking boots men")

soup = BeautifulSoup(re.text, 'lxml' )
print(soup.find(class_='product__name--ellipsis').p.a.text)
print(soup.find(class_='qa-single-price'))

2nd Code tried locally- works fine

    html_doc = """
<div >
    <ul ><li >
    <span >Available prices</span>
    <span  aria-live="polite">Clearance price $139.95</span>
    <span >$139.95</span></li>
    <li  aria-hidden="true">
    </li><li  aria-live="polite">
    <span >Original price</span>
    <span ></span></li></ul></div>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.find(class_='qa-single-price').text)

CodePudding user response:

Actually, price is generating dynamically by javascript.If you make disabled javascript then you will see that the price goes disappeared but the title/names remain unchanged. So you didn't grab price from the browser request. You can grab all data from api calls json response.

CodePudding user response:

The data is loaded dynamically, so requests won't render it. However, the data is also available in JSON format on the website, you can use the built-in re/json modules the find the correct data.

There's no need to use BeautifulSoup since the returned data is in JSON.

Here's a working example:

import re
import json
import requests


response = requests.get(
    "https://www.mec.ca/en/search?org_text=hiking boots men&text=hiking boots men"
)

data = json.loads(re.search(r"window.ProductList = (\[.*?\]);", response.text).group(1))

for product in data:
    name = product["name"]
    try:
        price = product["mecPrice"]["price"]["value"]
    except TypeError:
        price = product["mecPrice"]["highPrice"]["formattedValue"]
    print("{:<70} {}".format(name, price))

Output:

Zamberlan 309 Trail Lite Gore-Tex Hiking Boots - Men's                 279.95
Salomon Quest 4 Gore-Tex Hiking Boots - Men's                          289.95
Keen Pyrenees Hiking Boots - Men's                                     219.95
Scarpa Kailash Trek Gore-Tex Hiking Boots - Men's                      299.95
Scarpa Maverick Mid Gore-Tex Hiking Boots - Men's                      199.95
Salomon Quest Element Gore-Tex Hiking Boots - Men's                    249.95
Lowa Renegade Gore-Tex Mid Light Hiking Boots - Men's                  339.95
Scarpa Terra GTX Hiking Boots - Men's                                  244.95
Zamberlan 960 Guide Gore-Tex RR Hiking Boots - Men's                   399.95
Zamberlan 900 Rolle Evo 2 Gore-Tex Hiking Boots - Men's                $299.95
Columbia Newton Ridge Plus II Waterproof Hiking Boots - Men's          139.95
Timberland Garrison Trail Waterproof Mid Hiking Boots - Men's          169.95
...
  • Related