Home > Blockchain >  Can each link of card be opened in selenium while scrapping infinte scroll
Can each link of card be opened in selenium while scrapping infinte scroll

Time:09-14

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd
import time as t

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')

chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver.exe") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
url = 'https://iremedy.com/search?query=Vital Signs Monitors'
browser.get(url)
items_list = []
while True:
    elements_on_page = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[class^="card"]')))
    print(len(elements_on_page), 'total items found')
    if len(elements_on_page) > 100:
        print('more than 100 found, stopping')
        break
    footer = wait.until (EC.presence_of_element_located((By.CSS_SELECTOR, 'footer[id="footer"]')))
    footer.location_once_scrolled_into_view
    t.sleep(2)
for el in elements_on_page:
    title = el.find_element(By.CSS_SELECTOR, 'h3[]')
    price = el.find_element(By.CSS_SELECTOR, 'div[]')
    items_list.append((title.text.strip(), price.text.strip()))
df = pd.DataFrame(items_list, columns = ['Item', 'Price'])
print(df)

This website has an infinte type of scroll, the above code only scrolls the page and copies the information. I want to visit each product card link and copy the images links, product name, categories, short description, price, availibility, SKU and Additional Information as well. Firstly, several products links pops up a box with additional similar products, which I am not aware of how to remove it. I want to copy the required information and then go back and then resume from next. Is that something possible?

I want to make all this process as fast as possible. Another thought crossed my mind, is to copy all the product links present in infinite scrolling page till last product. After that, executing multiple threads (if a good solution for dividing it on multiple threads) to scrap from those links. I want to know what is the fastest way and how to do it. Thanks

CodePudding user response:

Here is a way to move through each item, one by one, click on it,and access the 'more details' pop-up for each of them. bear in mind this is just demonstrative, to put you on the right path: you will have to select the stuff you need from there, and save it as you want. To make this work, I had to delete a couple of elements in page which would have intercepted the click otherwise.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd
import time as t

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')

chrome_options.add_argument("window-size=1920,1080")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 30)
url = 'https://iremedy.com/search?query=Vital Signs Monitors'
browser.get(url)
try:
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[aria-label="Accept All"]'))).click()
    print('accepted cookies')
except Exception as e:
    print('no cookie button!')

element_to_be_deleted = browser.find_element(By.CSS_SELECTOR, 'div[id="evidenceClientContainer"]')
second_element_to_be_deleted = browser.find_element(By.CSS_SELECTOR, 'div[data-id="zsalesiq"]')
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", element_to_be_deleted)
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", second_element_to_be_deleted)
items_list = []
while True:
    elements_on_page = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[class^="card"]')))
    print(len(elements_on_page), 'total items found')
    if len(elements_on_page) > 100:
        print('more than 100 found, stopping')
        break
    footer = wait.until (EC.presence_of_element_located((By.CSS_SELECTOR, 'footer[id="footer"]')))
    footer.location_once_scrolled_into_view
    t.sleep(2)
for el in elements_on_page:
    title = el.find_element(By.CSS_SELECTOR, 'h3[]')
    price = el.find_element(By.CSS_SELECTOR, 'div[]')
    el.find_element(By.CSS_SELECTOR, 'button[]').click()
    try:
        product_extra_info = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'ul[]'))).text.strip()
    except Exception as e:
        print(e)
        product_extra_info = 'no details'
    print(title.text.strip(), price.text.strip())
    print(product_extra_info)
    close_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div[]'))).find_element(By.CSS_SELECTOR, 'button[]')
    close_button.click()
    print('closed extra info popup, moving to the next element')
    t.sleep(2)
    print('_________________________')
    
    items_list.append((title.text.strip(), price.text.strip()))
df = pd.DataFrame(items_list, columns = ['Item', 'Price'])
print(df)

This will print in terminal:

accepted cookies
10 total items found
20 total items found
30 total items found
40 total items found
50 total items found
60 total items found
70 total items found
80 total items found
90 total items found
100 total items found
110 total items found
more than 100 found, stopping
Edan M3A Vital Signs Monitors $3,016.95
M3A Vital Signs Monitor/Meter with lightweight, portable design provides essential measurement of patient's vital signs
Suitable for adult, pediatric and neonatal patients
Features automatic, manual, continuous and average BP modes
Includes high-resolution LED screen (6"), real-time measurements and trend display, and user-friendly interface
2-year manufacturer's warranty for monitor; 1-year for accessories
closed extra info popup, moving to the next element
_________________________
M3 Vital Signs Monitors by Edan Instruments $2,783.95
M3 Vital Signs Monitors/Meters provide essential measurement of patients' vital signs
Lightweight, portable design includes high-resolution LED screen (6")
User-friendly interface offers real-time measurements and trend display
Auto, manual, continuous and average BP modes
If purchased in conjunction with CQCMDLN0002, the CareConnection vitals integration solution, the M3 will allow vitals to pass to the resident record within the EMR. For more information, please email careconnection@medline, or reach out to your local Medline sales representative.
closed extra info popup, moving to the next element
_________________________
Vital Signs Patient Monitors - Touch Screen $2,015.95
[....]

Based on these answers, you should now be able to get the data you need, in the form you need it.

  • Related