I am pretty new to web-scraping...
For example here is the part of my code:
labels = driver.find_elements(By.CLASS_NAME, 'form__item-checkbox-label.placeFinder-search__checkbox-label')
checkboxes = driver. find_elements(By.CLASS_NAME, 'form__item-checkbox-input.placeFinder-search__checkbox-input')
boxes = zip(labels,checkboxes)
time.sleep(3)
for label,checkbox in boxes:
if checkbox.is_selected():
label.click()
Here is another example:
driver.get(product_link)
time.sleep(3)
button = driver.find_element(By.XPATH, '//*[@id="tab-panel__tab--product-pos-search"]/h2')
time.sleep(3)
button.click()
And I am scraping through let's say hundreds of products. 90% of the time it works fina, but occasionally giver errors like couldn't locate the element or something is not clickable etc. But all these products pages are built the same. Moreover, if I just re-run code on the product that resulted in the error, mosr of the time from the 2nd or 3rd time I will be able to scrape the data and will not get the error back.
Why does it happen? Code stays the same, web page stays the same.. What is causing an error when it happens? The only thing that comes to my mind the Internet connection sometimes gets behind the code and the program is unable to see the elenebts it is looking for... But as you can see I have added time.sleep() but it does not always help...
How can this be avoided? It is really annoying to be forced to stay in front of the monitor all the day just to supervise and re-run the code.... I mean I guess I could just add the scrape fubction inside the try: except: else: block but I am still wondering why does the same code will sometimes work and sometimes return the error on the same page?
CodePudding user response:
Welcome to the "dirty" side of Web automation. We called it "Flaky
" tests. In other word they are "fragile". And the major disadvantage of Selenium Webdriver.
There could be several reasons of flaky situation:
- Network instability: Since all commands sent over network: client -> (selenium grid: in case need) -> browser driver -> actual browser.
- Any connection issue may cause reason to failed. CSS animations: Since it executes commands directly, if you have some animative transitions, it may cause to fail
- Ajax similar requests or dynamic element changing. If you have such "load more" or displaying after some actions, It may not dedect or still overlapping
And, last comment is sleep
is not good idea to use, actually it is againts to best practices. Instead of, use Expected Conditions to ensure elements are visible and ready
CodePudding user response:
In short Selenium deals with three distinct states of a WebElement.
Ideally, to click on any clickable element you need to induce WebDriverWait for the element_to_be_clickable() as follows:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[@id="tab-panel__tab--product-pos-search"]/h2"))).click()
Similarly you can also create a list of desired elements waiting for their visibility and click on them one by one waiting for each of them to be clickable as follows:
checkboxes = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "form__item-checkbox-input.placeFinder-search__checkbox-input")))
for checkbox in checkboxes:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((checkbox))).click()