Home > OS >  Getting None Value/Error from Selenium execute_script, scraping from ASX.au web
Getting None Value/Error from Selenium execute_script, scraping from ASX.au web

Time:10-04

def scrape_asx():

    driver= get_driver()
    #driver.get('http://www.asx.com.au/asx/statistics/prevBusDayAnns.do')
    driver.execute_script("window.open('http://www.asx.com.au/asx/statistics/prevBusDayAnns.do', '_self');")
    driver.maximize_window()
    time.sleep(5)
    driver.implicitly_wait(10)
    driver.find_element_by_css_selector('#onetrust-accept-btn-handler').click()
 
    time.sleep(10)
    driver.implicitly_wait(5)
    text = driver.execute_script("return document.getElementsByTagName('tr')[1].cells[3].firstElementChild.firstChild;")

    return text

Hi Guys, please help me with this issue, I have been stuck here for a day.

I am trying to extract the title from the contents here. Target title to be extracted

And is the console of the website. Console

Please help!!Thanks in advance.

CodePudding user response:

The information you're looking for is in an iframe, so - using selenium - you would need to switch to iframe, and then locate it. However, selenium is not a web scraping tool, but a testing tool: it should be the last call when trying to extract information from a website. The optimal way forward here is to scrape the source of that iframe. If you are only after the titles in that list, you can do the following:

url = 'https://www.asx.com.au/asx/v2/statistics/todayAnns.do'
df = pd.read_html(url)[0]
print(df)

The result printed in terminal:

    ASX Code    Date    Price sens. Headline
0   MOH 03/10/2022 8:16 PM  NaN Notice Under Section 708A 1 page 276.3KB
1   MOH 03/10/2022 8:14 PM  NaN Application for quotation of securities - MOH 8 pages 29.6KB
2   92E 03/10/2022 8:03 PM  NaN Letter to Shareholders 1 page 137.2KB
3   92E 03/10/2022 8:02 PM  NaN Notice of Annual General Meeting/Proxy Form 37 pages 673.7KB
4   NST 03/10/2022 8:01 PM  NaN Notice of 2022 Annual General Meeting 38 pages 1.4MB
... ... ... ... ...
695 VHT 03/10/2022 7:33 AM  NaN Volpara CEO appointed to the Board as Managing Director 2 pages 148.1KB
696 A2M 03/10/2022 7:33 AM  NaN Renewal of arrangements with China State Farm 2 pages 254.1KB
697 PVL 03/10/2022 7:32 AM  NaN Appendix 4G and Corporate Governance Statement 20 pages 446.3KB
698 NTL 03/10/2022 7:32 AM  NaN Cleansing Notice 1 page 153.6KB
699 NTL 03/10/2022 7:32 AM  NaN Proposed issue of securities - NTL 5 pages 27.5KB
700 rows × 4 columns

If you need more info from that page, you can extract it with Requests & BeautifulSoup, without the overheads of Selenium.

Relevant pandas docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html

  • Related