Home > Software engineering >  Dynamic Web scraping using Selenium
Dynamic Web scraping using Selenium

Time:04-08

I am trying to scrape the tables from the below dynamic webpage. I am using the below code to find the data in tables (they are under tag name tr). But I am getting empty list as output. Is there anything that I am missing here?

Please find the inspect code of webpage screenshot below

CodePudding user response:

Website have iframes, you need switch into desired iframe to access data. Didnt tested code, but should work

iframe = driver.find_element_by_xpath("//iframe[@id='IframeId']")
driver.switch_to_frame(iframe)

#Now you can get data
trs = driver.find_elements_by_tag_name('tr')

CodePudding user response:

The desired elements are within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the desired visibility_of_all_elements_located.

  • You can use either of the following Locator Strategies:

  • Using XPATH:

    driver.get("https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d")
    WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"//iframe[@id='IframeId']")))
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='container-fluid']//div[@class='span6']/strong")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    ['核能(Nuclear)', '燃煤(Coal)', '汽電共生(Co-Gen)', '民營電廠-燃煤(IPP-Coal)', '燃氣(LNG)', '民營電廠-燃氣(IPP-LNG)', ' 燃油(Oil)', '輕油(Diesel)', '水力(Hydro)', '風力(Wind)', '太陽能(Solar)', '抽蓄發電(Pumping Gen)']
    

Reference

You can find a couple of relevant discussions in:

  • Related