Home > Software engineering >  Selenium Web Scraping (InvalidSelectorException)
Selenium Web Scraping (InvalidSelectorException)

Time:02-28

I am trying to web scrap a website with navigating the different pages.

I tried this code on another website that works for scrapping. I have now trying to scrap another website and failed.

It shows that "selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified"

But I can't find the issue.. can anyone help to solve this.. Many thanks!!

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver = webdriver.Chrome(
    service=Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())
)

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get("https://www.28hse.com/buy/residential/a1")

for i in range(3):
    all_cells = driver.find_elements(By.CSS_SELECTOR, ".item property_item")
    
    for cell in all_cells:
        names = cell.find_elements(By.CSS_SELECTOR, ".areaUnitPrice.separation").text

        if len(names) > 0:
            print(names[0].text)
        else:
            print("--- no name ---")

    next_btn = driver.find_element(By.CSS_SELECTOR, ".pagination_hi[2]")
    next_btn.click()
    try:
        print(f"Done page {i 1}")
        element = WebDriverWait(driver, 30).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, ".product-brief-wrapper"))
        )
    except:
        print("Wait failed")

CodePudding user response:

This css_selector:

.pagination_hi[2]

definitely looks error prone. Possibly you intend to use:

.pagination

Additionally, you also need to change:

.item property_item

to

.item.property_item

CodePudding user response:

As @undetected Selenium mentioned lots of error in your code.

1.

all_cells = driver.find_elements(By.CSS_SELECTOR, ".item property_item")

It should be when you are using css selector each class name should use .

 all_cells = driver.find_elements(By.CSS_SELECTOR, ".item.property_item")
  1. names = cell.find_elements(By.CSS_SELECTOR, ".areaUnitPrice.separation").text

find_elements() returns list of elements and its not having property text

So this should be

names = cell.find_elements(By.CSS_SELECTOR, ".areaUnitPrice.separation")
  1. pagination link

    next_btn = driver.find_element(By.CSS_SELECTOR, ".pagination_hi[2]")

if you have two same elements present on the page and wants to select the second one using css selector you should use.

.pagination_hi:nth-of-type(2)

Ideally your code should like

next_btn = driver.find_element(By.CSS_SELECTOR, ".pagination_hi:nth-of-type(2)")

or you can use element uniquely identified on the page.

 next_btn = driver.find_element(By.CSS_SELECTOR, ".pagination_hi >a[rel='next']")

CodePudding user response:

Thank you for the reply! What does that mean:

(1) selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable

Shall be related to the next_btn?

(2) How am i suppose the scrap the text in .areaUnitPrice.separation?

Much thanks!!!!!!!!!!

  • Related