Home > Software engineering >  Unable to scrape kosis.kr even with selenium
Unable to scrape kosis.kr even with selenium

Time:12-14

I trying to scrape data from given link below. But I can not get html elements. I am using selenium with python. When I do print(driver.page_source), it prints just bunch of JS like when we try to scrape a javascript driven website with BeautifulSoup. I waited longer to render the whole page but still selenium driver can not get html rendered elements. So how do I scrape it?

https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1JH20151&vw_cd=MT_ETITLE&list_id=J1_10&scrId=&language=en&seqNo=&lang_mode=en&obj_var_id=&itm_id=&conn_path=MT_ETITLE&path=%2Feng%2FstatisticsList%2FstatisticsListIndex.do

I am trying scrape kosis.kr but selenium driver.page_source is giving nothig.

CodePudding user response:

The data of your interest is located in nested iframes on that page. Try this to get the tabular content from there:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1JH20151&vw_cd=MT_ETITLE&list_id=J1_10&scrId=&language=en&seqNo=&lang_mode=en&obj_var_id=&itm_id=&conn_path=MT_ETITLE&path=%2Feng%2FstatisticsList%2FstatisticsListIndex.do"

with webdriver.Chrome() as driver:
    driver.get(link)
    WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iframe_rightMenu")))
    WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iframe_centerMenu1")))
    for item in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"table[id='mainTable'] tr"))):
        data = [i.text for i in item.find_elements(By.CSS_SELECTOR,'th,td')]
        print(data)

CodePudding user response:

simply wait till the loading is finished. for example: until

$("#Loading").is(":visible") == false;

visualization

  • Related