Home > Blockchain >  How to loop over loadmore button while scraping in python?
How to loop over loadmore button while scraping in python?

Time:07-04

'''I've tried this code to loop over loadmore button to load all the products in webpage. But I'm facing error "Button is not clickable at point". Actually after loading all the products in webpage the loadmore button is not disappearing so my loop doesn't end. Button is still on the page after loading products but not clickable. That's why it gives me error. Please help me with my code.'''

from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
import time
url = 'https://mamaearth.in/shop'
driver = webdriver.Chrome (executable_path=r"Enter your path")
driver.get(url)
button = driver.find_element_by_xpath('//button[@]')
driver.execute_script("arguments[0].click();", button)
j = 0
try:
    while button.is_displayed():
        button.click()
        time.sleep(5)
        prodd = driver.find_elements_by_class_name("uniquewhite")
        newlist = prodd[j:]
        for productt in newlist:
            link = productt.find_element_by_tag_name('a').get_attribute('href')
            print(link)
        j = len(prodd) 1
        time.sleep(5)
except StaleElementReferenceException:
    pass
driver.quit()

CodePudding user response:

Try to add check

if newList:
    for productt in ...
else:
    break

so if no new entries appear after clicking LoadMore button loop breaks

CodePudding user response:

You can grab all the product links from that page using requests module. The following implementation will help you fetch product links from those pages that show up when you click on the load more button.

import requests
from bs4 import BeautifulSoup

base = 'https://mamaearth.in/product/{}'
link = 'https://mmrth-nd-api.honasa-production.net/v1/products/shopAllProducts'

params = {
    'pagenumber': 1,
    'pagesize': '20',
    'categoryId': '-1'
}
headers = {
    'Referer': 'https://mamaearth.in/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
    s.headers.update(headers)
    while True:
        res = s.get(link,params=params)
        try:
            container = res.json()['response']['list']['entities']['products']
        except KeyError:
            break
            
        for val in container.values():
            print(base.format(val['slug']))

        params['pagenumber'] =1

Output (truncated):

https://mamaearth.in/product/eggplex-conditioner-with-egg-protein-collagen-for-strength-shine-250-ml
https://mamaearth.in/product/eggplex-shampoo-with-egg-protein-collagen-for-strength-and-shine-250-ml
https://mamaearth.in/product/ubtan-face-wash-with-turmeric-saffron-for-tan-removal-150-ml
https://mamaearth.in/product/vitamin-c-face-wash-with-vitamin-c-and-turmeric-for-skin-illumination-250ml-2
https://mamaearth.in/product/skin-illuminate-sunscreen-gel-spf-50-with-vitamin-c-turmeric-for-uva-b-protection-50-g
https://mamaearth.in/product/vitamin-c-foaming-face-wash-combo-pack-with-refill-150ml-150ml
https://mamaearth.in/product/sunscreen-body-lotion-spf-30-300-ml-vitamin-c
https://mamaearth.in/product/sunscreen-body-lotion-spf-30-300-ml-ubtan
https://mamaearth.in/product/sunscreen-body-lotion-spf-30-300-ml-aloe-vera
  • Related