Home > Enterprise >  How to use @FindAll and @FindBy in Selenium by XPath for Web Scraping
How to use @FindAll and @FindBy in Selenium by XPath for Web Scraping

Time:04-09

Website

I use this method to scarp elements

name = driver.find_elements(By.XPATH, '//div[@]/a/em/font[3]/font')

but when I want inner product details then I have to move for scraping to that item page (Single Product page)

then I only access that item data but I want to scrap all the items data. It gives 1 item of data, but I want all the item's data.

All The Outer Details of Products (I know How to scrap this) With the arrow. But do not know how to scrap the inner details of all the items that are shown in picture 2 (next link)

I want to scrap these details that are indicated by the red color arrow by xpath

CodePudding user response:

To scrape internal data of the products, you will have to click on them one by one and then it will open in a new tab, so you will have to switch to a new tab then you should be able to scrape it.

Code:

driver.maximize_window()
wait = WebDriverWait(driver, 20)

driver.get("https://search.jd.com/Search?keyword=两件套套装裙&enc=utf-8&wq=两件套套装裙&pvid=c35452079d6240b3a5fab6c585b53856")

all_products = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//img[@data-img and not(@data-url) and @height='220']")))

print(len(all_products))
i= 1
for product in all_products:
    prd = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//img[@data-img and not(@data-url) and @height='220'])[{i}]")))
    driver.execute_script("arguments[0].scrollIntoView(true);", prd)
    prd.click()
    all_handles = driver.window_handles
    driver.switch_to.window(all_handles[1])
    print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.sku-name"))).get_attribute('innerText'))
    print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.p-price"))).text)
    driver.close()
    driver.switch_to.window(all_handles[0])
    i = i   1

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Website response is very slow, so I could not run the entire execution. However, the above code should work fine in your region.

Also, Stackoverflow is not letting me post the output as it contains some special chars.Please see the comment for the output.

  • Related