Home > Software engineering >  Extract timings within iframe from webpage
Extract timings within iframe from webpage

Time:09-02

I am trying to extract timings from this page. I would like these timings in text form to do more processing.

I have tried this code:

elements = driver.find_elements(by=By.CLASS_NAME, value="lyr_timeCount")
for elem in elements:
    times.append(elem.text)

using the selenium driver. However, elements is an empty list. I have also tried using the xPath with the same result. I have also tried this using beautiful soup with the same result.

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
times = soup.find_all('time', {'class': 'lyr_sqResTime'})

Both have resulted in empty lists. How can I extract the timing data from this webpage using either method?

CodePudding user response:

The timing elements are within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the visibility of all the desired elements.

  • You can use either of the following locator strategies:

    • Using CSS_SELECTOR:

      driver.get('https://www.capmetro.org/planner/?language=en_US&P=SQ&input=Museum Station (SB), Stop ID 5866&start=yes&widget=1.0.0&')
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Trip Planner']")))
      print([my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.lyr_timeCount")))])
      
    • Using XPATH:

      driver.get('https://www.capmetro.org/planner/?language=en_US&P=SQ&input=Museum Station (SB), Stop ID 5866&start=yes&widget=1.0.0&')
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Trip Planner']")))
      print([my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='lyr_timeCount ']")))])
      
    • Console Output:

      ['4 min\nMinutes\nconfirmed', '7 min\nMinutes', '16 min\nMinutes\nconfirmed', '20 min\nMinutes\nconfirmed', '21 min\nMinutes\nconfirmed', '23 min\nMinutes\nconfirmed', '37 min\nMinutes', '37 min\nMinutes', '39 min\nMinutes\nconfirmed', '42 min', '4:43 PM', '4:53 PM', '4:53 PM', '5:03 PM', '5:03 PM', '5:13 PM', '5:13 PM', '5:23 PM', '5:23 PM', '5:33 PM', '5:33 PM', '5:43 PM', '5:43 PM', '5:53 PM', '5:53 PM', '6:03 PM', '6:08 PM', '6:13 PM', '6:23 PM', '6:23 PM', '6:33 PM', '6:38 PM', '6:38 PM', '6:43 PM', '6:50 PM', '6:53 PM', '6:58 PM', '7:05 PM', '7:08 PM', '7:13 PM']
      
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait
     from selenium.webdriver.common.by import By
     from selenium.webdriver.support import expected_conditions as EC
    
  • Related