Home > Software engineering >  Why are elements missing in HTML while using selenium?
Why are elements missing in HTML while using selenium?

Time:12-26

I am trying to scrape the text of labels by

url='https://www.hydac.com/shop/en/GP_1000188028'

in Product Search section. I've tried all the solutions I know but got nowhere.

Here is my code:

items=soup.find_all('div',attrs={'class':'filter-options-item'})
for item in items:
    p=(item.find('label',attrs={'data-bind':'attr: {for: id}'})).find_all('span')
    for q in p:
        print(q.text)

CodePudding user response:

The website you are scraping probably uses cloudflare or some similar tools to prevent bots from scraping the site.

CodePudding user response:

BeautifulSoup only parses the HTML, it do not handle requesting or rendering what seems to be your issue.

Check the behaviour of the website in your browser, it needs some time to render the labels, so you simply have to wait.

Option#1

Simply use time.sleep() to wait:

...
driver.get(url)
time.sleep(5)
...

Option#2

Use selenium waits(recommended) to solve the issue:

...
driver.get(url)
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[data-bind="text: label"]')))
...

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
url = 'https://www.hydac.com/shop/en/GP_1000188028'

driver.get(url)
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[data-bind="text: label"]')))

soup = BeautifulSoup(driver.page_source)
[x.get_text(strip=True) for x in soup.select('#narrow-by-list label')]

Output

['3.5 m/s (piston 2)58',
'0.8 m/s (piston 3)8',
'Aluminium31',
'Carbon steel35',
'NBR / PTFE compound58',
'PUR8',
'10 l6',
'100 l5',
'120 l3',...]
  • Related