Home > Back-end >  Scraping a web page from a website built with React and jQuery
Scraping a web page from a website built with React and jQuery

Time:09-17

I need to extract the product name, price and default color from the following link: Link

However, every time I load the below script the information is retrieved differently (sometimes all three values are printed, sometimes 1-2 of them or none). This happens regardless of WebDriverWait.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains as AC
#import json
from bs4 import BeautifulSoup as soup
browser = webdriver.Firefox()
browser.get('https://shop.mango.com/gb/women/skirts-midi/midi-satin-skirt_17042020.html?c=99')

wait = WebDriverWait(browser, 100).until(EC.presence_of_all_elements_located)

s = soup(browser.page_source, 'html.parser')
name = s.select('.product-name')[0].getText()
price = s.select('.product-sale')[0].getText()
color = s.select('.colors-info')[0].getText()
print(name, price, color)

Would you please advise how to extract all three elements? If I try to download the page with requests or scrapy the above elements are missing.

CodePudding user response:

Few points :

  1. There's an accept cookies button, which you have to click in order to proceed further.

  2. After cookies button there is a modal close button, which we have to click in order to proceed further.

  3. Use Explicit waits, visibility of element for this case.

  4. Prefer CSS over xpath.

  5. Maximize the browser.

Code :

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)

driver.get('https://shop.mango.com/gb/women/skirts-midi/midi-satin-skirt_17042020.html?c=99')

try:
    print("to accept cookies")
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='onetrust-accept-btn-handler']"))).click()
except:
    pass

try:
    print("to close modal pop up windows")
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[class*='closeModal'][class$='confirmacionPais']"))).click()
except:
    pass


name = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'product-name'))).text
price = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'product-sale'))).text
color = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'colors-info'))).text

print(name, price, color)

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Output :

to accept cookies
to close modal pop up windows
Midi satin skirt £39.99 Black
  • Related