Getting None Value/Error from Selenium execute_script, scraping from ASX.au web-CodePudding

def scrape_asx():

    driver= get_driver()
    #driver.get('http://www.asx.com.au/asx/statistics/prevBusDayAnns.do')
    driver.execute_script("window.open('http://www.asx.com.au/asx/statistics/prevBusDayAnns.do', '_self');")
    driver.maximize_window()
    time.sleep(5)
    driver.implicitly_wait(10)
    driver.find_element_by_css_selector('#onetrust-accept-btn-handler').click()
 
    time.sleep(10)
    driver.implicitly_wait(5)
    text = driver.execute_script("return document.getElementsByTagName('tr')[1].cells[3].firstElementChild.firstChild;")

    return text

Hi Guys, please help me with this issue, I have been stuck here for a day.

I am trying to extract the title from the contents here.

And is the console of the website.

Please help!!Thanks in advance.

CodePudding user response：

The information you're looking for is in an iframe, so - using selenium - you would need to switch to iframe, and then locate it. However, selenium is not a web scraping tool, but a testing tool: it should be the last call when trying to extract information from a website. The optimal way forward here is to scrape the source of that iframe. If you are only after the titles in that list, you can do the following:

url = 'https://www.asx.com.au/asx/v2/statistics/todayAnns.do'
df = pd.read_html(url)[0]
print(df)

The result printed in terminal:

    ASX Code    Date    Price sens. Headline
0   MOH 03/10/2022 8:16 PM  NaN Notice Under Section 708A 1 page 276.3KB
1   MOH 03/10/2022 8:14 PM  NaN Application for quotation of securities - MOH 8 pages 29.6KB
2   92E 03/10/2022 8:03 PM  NaN Letter to Shareholders 1 page 137.2KB
3   92E 03/10/2022 8:02 PM  NaN Notice of Annual General Meeting/Proxy Form 37 pages 673.7KB
4   NST 03/10/2022 8:01 PM  NaN Notice of 2022 Annual General Meeting 38 pages 1.4MB
... ... ... ... ...
695 VHT 03/10/2022 7:33 AM  NaN Volpara CEO appointed to the Board as Managing Director 2 pages 148.1KB
696 A2M 03/10/2022 7:33 AM  NaN Renewal of arrangements with China State Farm 2 pages 254.1KB
697 PVL 03/10/2022 7:32 AM  NaN Appendix 4G and Corporate Governance Statement 20 pages 446.3KB
698 NTL 03/10/2022 7:32 AM  NaN Cleansing Notice 1 page 153.6KB
699 NTL 03/10/2022 7:32 AM  NaN Proposed issue of securities - NTL 5 pages 27.5KB
700 rows × 4 columns

If you need more info from that page, you can extract it with Requests & BeautifulSoup, without the overheads of Selenium.

Relevant pandas docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html