Home > Enterprise >  Python Beautifulsoup4 all parser not working?
Python Beautifulsoup4 all parser not working?

Time:01-08

so I want to scrape a website but the current problem Iam facing is that whenever I try to print out the element Iam scrapping it just returns an empty list. I know that the problem is that the parser can't find the class in the hmtl code. I tried all the parsers which are supported with Beautifoulsoup4 => 'lxmL' and 'hmtl5lib' but it still doesn't work. I even tried downgrading the version from 4.11.0 => 4.9.3, still doesn't work. Any ideas?

import requests
import random
from bs4 import BeautifulSoup

products = {
    1: "Hoodies",
    2: "Sunglasses",
    3: "Couple-T-shirts",
    4: "Wall-Stickers",
    5: "Rugs",
    6: "Dog-Bed",
    7: "Claw-Cutter",
    8: "Fur-Remover",
    9: "Led-Keyboard",
    10: "Wireless-Chargers",
    11: "Powerbank",
    12: "Game-Controller",
    13: "Portable-Speakers",
    14: "Scalp-Massager",
    15: "Blackhead-Remover",
    16: "Lash-Products",
    17: "Makeup-Kit",
    18: "Air-Tag-Tracker",
    19: "Air-Purifiers",
    20: "Pixelart",
    21: "Yoga-Mats",
    22: "Face-Masks",
    23: "Fitness-Watches",
    24: "Resistance-Bands",
    25: "Air-Purifiers",
    26: "Cell-Phone-Mounts",
    27: "Wireless-Security-Cameras",
    28: "Massage-Tools",
    29: "Air-Purifiers",
    30: "Eyeliner-Pencil",
    31: "Water-Filters",
    32: "Slow-Feeder-Dog-Bowls",
    33: "Video-Doorbells",
    34: "Solar-Outdoor-Lights",
    35: "Phone-Grip",
    36: "Slow-Feeder-Dog-Bowls",
    37: "Pajamas",
    38: "Skin-Care-Oil",
    39: "Flasks",
    40: "Monitor-Holders",
    41: "Watches",
    42: "Rings",
    43: "Monitor-Holders",
    44: "Nail-Polish",
    45: "Rice-Cooker",
}

i = random.randint(1, 45)

url = f'https://www.aliexpress.com/af/{products[i]}.html?spm=a2g0o.productlist.10000020initiative_id=SB_20230106091400&dida=y&origin=n'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html5lib')
soup.prettify()
product = soup.find_all('a', {'class': 'manhattan--container--1lP57Ag cards--gallery--2o6yJVt'})
productimage = soup.find_all('img', {'class': 'manhattan--img--36QXbtQ product-img'})
productprice = soup.find_all('div', {'class': 'manhattan--price-sale--1CCSZfK'})
productrating = soup.find_all('span', {'class': 'manhattan--evaluation--3cSMntr'})

print(soup)

productlinkstring = product[0]['href']
productlinkstring = 'https://www'   productlinkstring[4:]
productimagelinkstring = productimage[0]['src']
productimagelinkstring = 'https://'   productimagelinkstring[2:]
productpricestring = productprice[0].text
productratingsting = productrating[0].text

producttitle = products[i]
producttitlestring = producttitle.replace('-', ' ')

productsstring = products[i]
productsstringright = productsstring.replace('-', ' ')
endpoint = f'https://api.datamuse.com/words?ml={productsstringright}&max=18'

response = requests.get(endpoint)

data = response.json()

print(productlinkstring)
print(productimagelinkstring)

CodePudding user response:

I don't think there's anything wrong with your parser.

Some notes about my tests:

This indicates that the elements you're targeting are rendered with JavaScript, and that's why they're not found.


The data you want seems to be in one of the script tags though. I could target that tag with

scriptTag = soup.select_one('meta[name="aplus-auto-exp"]~script')

and if any other parser was not used, the JavaScript code could probably be gotten simply with jScript = scriptTag.get_text(), but with html5lib parser I had to stringify and strip the tag name:

jScript = scriptTag.prettify().strip()
jScript = jScript.strip('<script').split('>', 1)[-1].strip('</script>').strip()

And then I extracted the list of products with findObj_inJS, which uses slimit to parse JavaScript:

itemList = findObj_inJS(jScript, '"itemList"')['content']

itemList contained 60 items. The first 2 items are prettified as JSON below:

[
    {
        "itemType": "productV3",
        "productType": "natural",
        "nativeCardType": "nt_srp_cell_g",
        "itemCardType": "manhattan",
        "productId": "3256804896450950",
        "lunchTime": "2022-12-20 00:00:00",
        "image": {
            "imgUrl": "//ae01.alicdn.com/kf/S39f71cffe79a4a6981083f57128103c9d/Anti-Gravity-Humidifier-Diffuser-Water-Drop-Falling-Remote-Control-Mini-Mist-Maker-Humidifier-Air-Purifiers-Droplets.jpg_220x220xz.jpg",
            "imgWidth": 220,
            "imgHeight": 220,
            "imgType": "0"
        },
        "title": {
            "seoTitle": "Anti Gravity Humidifier Diffuser Water Drop Falling Remote Control Mini Mist Maker Humidifier Air Purifiers Droplets Upflow USB",
            "displayTitle": "Anti Gravity Humidifier Diffuser Water Drop Falling Remote Control Mini Mist Maker Humidifier Air Purifiers Droplets Upflow USB",
            "shortTitle": false
        },
        "prices": {
            "skuId": "12000031580241123",
            "pricesStyle": "default",
            "builderType": "skuCoupon",
            "currencySymbol": "US $",
            "prefix": "Sale price:",
            "salePrice": {
                "discount": -1,
                "minPriceDiscount": 50,
                "priceType": "sale_price",
                "currencyCode": "USD",
                "minPrice": 26,
                "minPriceType": 2,
                "formattedPrice": "US $26.00"
            },
            "taxRate": "0"
        },
        "sellingPoints": [
            {
                "sellingPointTagId": "m0000063",
                "tagStyleType": "default",
                "tagContent": {
                    "displayTagType": "text",
                    "tagText": "Extra 1% off with coins",
                    "tagStyle": {
                        "color": "#FD384F",
                        "position": "2"
                    }
                },
                "source": "flexiCoin_new_atm"
            },
            {
                "sellingPointTagId": "m0000064",
                "tagStyleType": "default",
                "tagContent": {
                    "displayTagType": "text",
                    "tagText": "Free Shipping",
                    "tagStyle": {
                        "color": "#009966",
                        "position": "4"
                    }
                },
                "source": "Free_Shipping_atm"
            }
        ],
        "store": {
            "storeId": 1102377281,
            "aliMemberId": 2668032633,
            "storeName": "Yousmile Life Store",
            "storeUrl": "//www.aliexpress.com/store/1102377281"
        },
        "trace": {
            "pdpParams": {
                "pdp_cdi": "{"traceId":"212243c016730635978473504d06bf","itemId":"3256804896450950","fromPage":"search","skuId":"12000031580241123","shipFrom":"US","order":"0","star":"","freeShip":"true"}",
                "pdp_npi": "2@dis!USD!52.0!26.0!!!!!@212243c016730635978473504d06bf!12000031580241123!sea",
                "pdp_perf": "main_img=//ae01.alicdn.com/kf/S39f71cffe79a4a6981083f57128103c9d.jpg",
                "pdp_ext_f": "{"sku_id":"12000031580241123"}"
            },
            "exposure": {
                "displayCategoryId": "",
                "postCategoryId": "625",
                "selling_point": "m0000063,m0000064",
                "algo_exp_id": "77f9a225-26b3-4384-990c-3f411893324d-0"
            },
            "click": {
                "algo_pvid": "77f9a225-26b3-4384-990c-3f411893324d",
                "haveSellingPoint": "true"
            },
            "detailPage": {
                "algo_pvid": "77f9a225-26b3-4384-990c-3f411893324d",
                "algo_exp_id": "77f9a225-26b3-4384-990c-3f411893324d-0"
            },
            "custom": {},
            "utLogMap": {
                "original_price_type": "offer",
                "formatted_price": "US $26.00",
                "csp": "26.0,1",
                "x_object_type": "productV3",
                "algo_pvid": "77f9a225-26b3-4384-990c-3f411893324d",
                "hit_19_forbidden": false,
                "is_detail_next": "1",
                "model_ctr": 0.06788578629493713,
                "sku_id": "12000031580241123",
                "mixrank_success": "false",
                "custom_group": 3,
                "sku_ic_tags": "[]",
                "is_adult_certified": false,
                "mixrank_enable": "false",
                "ump_atmospheres": "none",
                "oip": "52.0,0",
                "selling_point": "m0000063,m0000064",
                "original_price_strategy": "sku_opt",
                "x_object_id": "1005005082765702"
            }
        }
    },
    {
        "itemType": "productV3",
        "productType": "natural",
        "nativeCardType": "nt_srp_cell_g",
        "itemCardType": "manhattan",
        "productId": "3256803823899352",
        "lunchTime": "2022-03-09 00:00:00",
        "image": {
            "imgUrl": "//ae01.alicdn.com/kf/S5d14059b4d364bf5a41d2c08a5d6e735y/Portable-Air-Purifier-Anion-Air-Purification-Xiomi-Air-Freshener-Ionizer-Cleaner-Dust-Cigarette-Smoke-Remover-Toilet.jpg_220x220xz.jpg",
            "imgWidth": 220,
            "imgHeight": 220,
            "imgType": "0"
        },
        "title": {
            "seoTitle": "Portable Air Purifier Anion Air Purification Xiomi Air Freshener Ionizer Cleaner Dust Cigarette Smoke Remover Toilet Deodorant",
            "displayTitle": "Portable Air Purifier Anion Air Purification Xiomi Air Freshener Ionizer Cleaner Dust Cigarette Smoke Remover Toilet Deodorant",
            "shortTitle": false
        },
        "prices": {
            "skuId": "12000027730379754",
            "pricesStyle": "default",
            "builderType": "skuCoupon",
            "currencySymbol": "US $",
            "prefix": "Sale price:",
            "originalPrice": {
                "priceType": "original_price",
                "currencyCode": "USD",
                "minPrice": 20.8,
                "minPriceType": 1,
                "formattedPrice": "US $20.80"
            },
            "salePrice": {
                "discount": -1,
                "minPriceDiscount": 95,
                "priceType": "sale_price",
                "currencyCode": "USD",
                "minPrice": 0.99,
                "minPriceType": 2,
                "formattedPrice": "US $0.99"
            },
            "taxRate": "0"
        },
        "sellingPoints": [
            {
                "sellingPointTagId": "m0000040",
                "tagStyleType": "default",
                "tagContent": {
                    "displayTagType": "image",
                    "tagImgUrl": "https://ae01.alicdn.com/kf/S8cbad032762b405b8ebd8f30bca4bc83u/338x64.png",
                    "tagImgWidth": 338,
                    "tagImgHeight": 64,
                    "tagStyle": {
                        "color": "#FD384F",
                        "position": "1"
                    }
                },
                "source": "new_user_platform_allowance_atm"
            },
            {
                "sellingPointTagId": "m0000064",
                "tagStyleType": "default",
                "tagContent": {
                    "displayTagType": "text",
                    "tagText": "Free Shipping",
                    "tagStyle": {
                        "color": "#009966",
                        "position": "4"
                    }
                },
                "source": "Free_Shipping_atm"
            },
            {
                "sellingPointTagId": "1000013764",
                "tagStyleType": "default",
                "tagContent": {
                    "displayTagType": "text",
                    "tagText": "Free Return",
                    "tagStyle": {
                        "color": "#009966",
                        "position": "4"
                    }
                },
                "source": "Free_return_atm"
            }
        ],
        "evaluation": {
            "starRating": 4.4,
            "starUrl": "https://ae01.alicdn.com/kf/S567d6bf538214abf95c1e5825c7e6a05t/48x48.png",
            "starWidth": 48,
            "starHeight": 48
        },
        "trade": {
            "tradeDesc": "1340 sold"
        },
        "store": {
            "storeId": 1101982761,
            "aliMemberId": 2657161543,
            "storeName": "BOYALIGE Official Store",
            "storeUrl": "//www.aliexpress.com/store/1101982761"
        },
        "trace": {
            "pdpParams": {
                "pdp_cdi": "{"traceId":"212243c016730635978473504d06bf","itemId":"3256803823899352","fromPage":"search","skuId":"12000027730379754","shipFrom":"CN","order":"1340","star":"4.4","freeShip":"true","shipSellingPoint":"freeReturn"}",
                "pdp_npi": "2@dis!USD!20.8!0.99!!!!!@212243c016730635978473504d06bf!12000027730379754!sea",
                "pdp_perf": "main_img=//ae01.alicdn.com/kf/S5d14059b4d364bf5a41d2c08a5d6e735y.jpg",
                "pdp_ext_f": "{"sku_id":"12000027730379754"}"
            },
            "exposure": {
                "displayCategoryId": "",
                "postCategoryId": "613",
                "selling_point": "m0000040,m0000064,1000013764",
                "algo_exp_id": "77f9a225-26b3-4384-990c-3f411893324d-1"
            },
            "click": {
                "algo_pvid": "77f9a225-26b3-4384-990c-3f411893324d",
                "haveSellingPoint": "true"
            },
            "detailPage": {
                "algo_pvid": "77f9a225-26b3-4384-990c-3f411893324d",
                "algo_exp_id": "77f9a225-26b3-4384-990c-3f411893324d-1"
            },
            "custom": {},
            "utLogMap": {
                "original_price_type": "offer",
                "formatted_price": "US $0.99",
                "csp": "0.99,1",
                "x_object_type": "productV3",
                "algo_pvid": "77f9a225-26b3-4384-990c-3f411893324d",
                "hit_19_forbidden": false,
                "is_detail_next": "1",
                "model_ctr": 0.21320748329162598,
                "sku_id": "12000027730379754",
                "mixrank_success": "false",
                "custom_group": 3,
                "sku_ic_tags": "[]",
                "is_adult_certified": false,
                "mixrank_enable": "false",
                "ump_atmospheres": "new_user_platform_allowance,none",
                "oip": "20.8,1",
                "selling_point": "m0000040,m0000064,1000013764",
                "original_price_strategy": "sku_opt",
                "x_object_id": "1005004010214104"
            }
        },
        "config": {
            "prices": {
                "color": "#FD384F"
            }
        }
    }
]

You could write a function to reduce it to only the information you want, something like:

def reduceProductInfo(origInf): 
    try: prodId = origInf['productId']
    except: return {'errorMsg': 'productId expected', 'orig': origInf}
    pDets = {
        'id': prodId, 'link': f'https://www.aliexpress.com/item/{prodId}.html'
    }
    try: pDets['title'] = origInf["title"]["displayTitle"]
    except Exception as e: pDets['title'] = f'!{type(e)} - {e}!'
    try: pDets['salePrice'] = origInf["prices"]["salePrice"]["formattedPrice"]
    except Exception as e: pDets['salePrice'] = f'!{type(e)} - {e}!'
    try: pDets['imageLink'] = origInf["image"]["imgUrl"]
    except Exception as e: pDets['imageLink'] = f'!{type(e)} - {e}!'
    if pDets['imageLink'][:2] == '//': 
        pDets['imageLink'] = f"https:{pDets['imageLink']}"
    return pDets

and then just reduce the whole list with

productList = [reduceProductInfo(item) for item in itemList]

the first 5 items of productList would look like this.

If you want to go with this approach, you might want to check out my getNestedVal function as well - it's rather useful for quickly figuring out the sequence of keys and indices needed to get a nested value.

CodePudding user response:

I don't know what exactly is your goal in scraping, but have you tried

soup = BeautifulSoup(page.content, 'html.parser')

After that, your code works perfectly for me and returns good urls at the end of the script

https://www.aliexpress.com/item/1005004643645808.html?algo_pvid=36f2290b-a7f6-4a43-8d4e-cb427a255b22&algo_exp_id=36f2290b-a7f6-4a43-8d4e-cb427a255b22-0&pdp_ext_f={"sku_id":"12000030390398106"}&pdp_npi=2@dis!EUR!3.8!1.79!!!!!@2100b20d16730874440116525d06ef!12000030390398106!sea&curPageLogUid=TO7uEnR4RcFE
https://ae01.alicdn.com/kf/Sba697dac347c425bbbc16a0f71d8a534W/Mini-Tracking-Device-Tracking-Air-Tag-Key-Child-Finder-Pet-Tracker-Location-Smart-Bluetooth-Tracker-Car.jpg_220x220xz.jpg
  • Related