I am learning how to scrap webpages and I got this issue with this website: https://www.centris.ca/en/properties~for-sale~brossard?view=Thumbnail
During the script execution, it would randomly give me a popup to subscribe: https://imgur.com/a/tzCVvg4
I already got the code to handle it, but it pops up at completely random intervals.
Like my current selection criteria would mean i have to scrap 41 pages, sometimes it is showing up at page 2, right before I click next page, sometimes it is showing up at 39, right as I am grabbing the price of a particular listing.
I can't just let the page sit there and wait, because I tried that and sometimes it doesn't show up for a solid 10 minutes and sometimes it shows at the 5min mark or 2min mark (since start of script).
If i visit the page manually, I get this issue way less often. I could click through all the listings and not get the pop up even once.
I am at a loss as to how to handle this.
import numpy as np
from Tools.scripts.dutree import display
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time
url = 'https://www.centris.ca/en/properties~for-sale~brossard?view=Thumbnail'
def scrap_pages(driver):
listings = driver.find_elements(By.CLASS_NAME, 'description')
if listings[-1].text.split('/n')[0] == '': del listings[-1]
for listing in listings:
price=listing.find_element(By.XPATH, ".//div[@class='price']/meta[@itemprop='price']").text
mls = listing.find_element(By.XPATH, ".//div[@id='MlsNumberNoStealth']/p").text
prop_type = listing.find_element(By.XPATH, ".//div[@class='location-container']/span[@itemprop='category']").text
addr = listing.find_element(By.XPATH, ".//div[@class='location-container']/span[@class='address']").text
city = addr.split('\n')[1]
sector = addr.split('\n')[2]
if prop_type == 'Land for sale' or prop_type == 'Lot for sale':
bedrooms = 'NA'
bathrooms = 'NA'
else:
bedrooms = listing.find_element(By.XPATH, ".//div[@class='cac']").text
bathrooms = listing.find_element(By.XPATH, ".//div[@class='sdb']").text
listing_item = {
'mls':mls,
'price': price,
'Address': addr,
'property Type': prop_type,
'city': city,
'bedrooms': bedrooms,
'bathrooms': bathrooms,
'sector': sector
}
centris_list.append(listing_item)
if __name__ == '__main__':
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
#chrome_options.add_argument("headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
centris_list=[]
driver.get(url)
total_pages = driver.find_element(By.CLASS_NAME,'pager-current').text.split('/')[1].strip()
for i in range(1,int(total_pages)):
scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR,'li.next> a').click()
time.sleep(3)
if len(driver.find_elements(By.XPATH, ".//div[@class='DialogInsightLightBoxCloseButton']")) > 0:
driver.find_element(By.XPATH, ".//div[@class='DialogInsightLightBoxCloseButton']").click()
time.sleep(0.6)
print('found subscription box')
CodePudding user response:
There are some ways to disable pop-ups in chrome but they are rarely work. You can search for disabling pop-ups with chrome options but i doubt anything will help.
I can just suggest more elegant solution:
scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR,'li.next> a').click()
time.sleep(3)
try:
driver.find_element(By.CSS_SELECTOR, 'div[]').click()
print('pop-up closed')
except (NoSuchElementException, ElementNotInteractableException):
pass
for this to work you need to import error modules from selenium.common.exceptions import NoSuchElementException, ElementNotInteractableException
Another option is to surrond the whole 'scrap page, click next' with a try block. But in that case you will need to catch another error: ElementClickInterceptedException
. Code will look like this:
try:
scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR, 'li.next> a').click()
except ElementClickInterceptedException as initial_error:
try:
driver.find_element(By.CSS_SELECTOR, 'div[]').click()
print('pop-up closed')
scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR, 'li.next> a').click()
except NoSuchElementException:
raise initial_error
But you see that in that case you need to use same lines
scrap_pages(driver)
driver.find_element(By.CSS_SELECTOR,'li.next> a').click()
twice (in try and in except). Moreover, this pop-up can appear after you finally click the link and this will prevent correct scraping. It seems that the first option is better.