Home > database >  Why does selenium only get a part of the HTML from a webpage?
Why does selenium only get a part of the HTML from a webpage?

Time:10-30

I tried to use selenium to navigate the web-untis webpage and look for any tests taking place this week, but I can't find the web-elements in the html-file that contain the data for the lessons. All the other stuff like the menu is in the file, but the tree of the timetable is gone.

So far, I tried waiting longer to give the page time to load in, using different search patterns and trying to go up the tree. Selenium never found anything and always returned an empty list.

Here is the code I wrote so far(unoptimised prototype):

import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# setting webdriver up
url = "https://klio.webuntis.com"
path = "C:\Program Files (x86)/chromedriver.exe"
options = Options()
options.add_argument("headless")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get(url)
time.sleep(1.0)

###################################################################################################

# getting to login screen
schoolParent = driver.find_element(By.CLASS_NAME, "Select-input")
schoolInput = schoolParent.find_elements(By.XPATH, "*")[0]
schoolInput.send_keys("<schoolName>")
time.sleep(0.6)
schoolInput.send_keys(Keys.RETURN)
time.sleep(1.0)

# logging in
userInputs = driver.find_elements(By.CLASS_NAME, "un-input-group__input")
for element in userInputs:
    if element.get_attribute("type") == "text":
        element.send_keys("<username>")
    elif element.get_attribute("type") == "password":
        element.send_keys("<password>")
        element.send_keys(Keys.RETURN)
time.sleep(1.0)

# going to timetable
buttons = driver.find_elements(By.CLASS_NAME, "item-container")
time.sleep(0.5)
for element in buttons:
    if element.text == "Mein Stundenplan":
        element.click()
time.sleep(5)
thisWeek = bs(driver.page_source, "html.parser")
test = driver.find_elements(By.CLASS_NAME, "un-timetable__dnd-avatar un-timetable__dnd-avatar--hidden")
print(test)
# time.sleep(5.0)
# buttons = driver.find_elements(By.CSS_SELECTOR, "button[type=button]")
# for element in buttons:
#     print(element.get_attribute("class"))                                 come back later...
#     if element.get_attribute("class") == "btn btn-default":
#         print("should work...")
time.sleep(6.0)

####################################################################################################

# printing html-file
print(thisWeek.prettify())

btw, the class I am searching for actually exists when I inspect the element in chrome: sample Image

CodePudding user response:

As the element is in an <iframe> you have to switch into this frame - You could use WebDriverWait:

WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"id_of_the_iframe")))

and use css selector or xpath to select by multiple classes:

driver.find_elements(By.CSS_SELECTOR, 'un-timetable__dnd-avatar.un-timetable__dnd-avatar--hidden')

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.maximize_window()
url = 'https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe?retiredLocale=de'

driver.get(url)
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"frame_a_simple_iframe")))

print(driver.find_element(By.CSS_SELECTOR,'h1').text)
# --> <iframe>: The Inline Frame element

or work on the driver.page_source:

print(driver.page_source)
  • Related