Home > Blockchain >  How to extract data from a table within morningstar.com using Selenium and Python
How to extract data from a table within morningstar.com using Selenium and Python

Time:06-18

I would like to grab data in the Key ratio table from the below url https://financials.morningstar.com/ratios/r.html?t=0P000000B7&culture=en&platform=sal

I tried to access it by selenium using xpath but in vein even I switched to iframe.

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time

url = 'https://www.morningstar.com/stocks/xnas/amzn/quote'
browser = webdriver.Firefox()
    
#Open the URL in browser
browser.get(url)
   
#####Click Key ratio button#####
#Wait until the Key ratio button is clickable
element = WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="keyStats"]')))
#Click keyRatioBtn
element.click()
    
#####Click Full Key ratio url#####
#Wait until the Full Key ratio url is clickable
element = WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, '//a[@]')))
#Click Full Key ratio url
element.click()
    
#####Get ROE list#####browser.implicitly_wait(20)    
time.sleep(5)
iframe = browser.find_elements(By.TAG_NAME, 'iframe')    
browser.switch_to.frame(1)
roeList = browser.find_element_by_xpath('//*[@id="tab-profitability"]')
print(roeList.get_attribute('innerHTML'))

CodePudding user response:

You can grab data in the Key ratio table using selenium with pandas as follows:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-infobars")
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)   
URL ='https://financials.morningstar.com/ratios/r.html?t=0P000000B7&culture=en&platform=sal'
driver.get(URL)

table = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="tab-profitability"]'))).get_attribute("outerHTML")
df = pd.read_html(table)[0]
print(df.dropna(how='all'))

Output:

 Margins % of Sales 2012-12 2013-12 2014-12  ... 2019-12 2020-12 2021-12     TTM
1               Revenue  100.00  100.00  100.00  ...  100.00  100.00  100.00  100.00
3                  COGS   93.23   93.12   93.04  ...   86.16   86.66   85.89   86.59
5          Gross Margin    6.77    6.88    6.96  ...   13.84   13.34   14.11   13.41
7                  SG&A    5.41    5.72    6.61  ...    8.58    7.43    8.81    9.23
9                   R&D       —       —       —  ...       —       —       —       —
11                Other    0.26    0.15    0.15  ...    0.07   -0.02    0.01    0.06
13     Operating Margin    1.11    1.00    0.20  ...    5.18    5.93    5.30    4.12
15  Net Int Inc & Other   -0.22   -0.32   -0.32  ...   -0.20    0.33    2.82    0.61
17           EBT Margin    0.89    0.68   -0.12  ...    4.98    6.26    8.12    4.73

[9 rows x 12 columns]

CodePudding user response:

Within the website clicking on the link with text Full Key Ratios Data opens an adjascent tab. So you have switch to the new tab inducing WebDriverWait for number_of_windows_to_be(2) and using dataframe from you can use the following solution:

  • Code Block:

    driver.get("https://www.morningstar.com/stocks/xnas/amzn/quote")
    windows_before  = driver.current_window_handle
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='keyStats']"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(., 'Full Key Ratios Data')]"))).click()
    WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
    windows_after = driver.window_handles
    new_window = [x for x in windows_after if x != windows_before][0]
    driver.switch_to.window(new_window)
    data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='r_table1 text2 print97']"))).get_attribute("outerHTML")
    df  = pd.read_html(data)
    print(df)
    
  • Console Output:

    [     Margins % of Sales 2012-12 2013-12 2014-12 2015-12 2016-12 2017-12 2018-12 2019-12 2020-12 2021-12     TTM
    0                   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    1               Revenue  100.00  100.00  100.00  100.00  100.00  100.00  100.00  100.00  100.00  100.00  100.00
    2                   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    3                  COGS   93.23   93.12   93.04   91.21   89.69   89.84   86.75   86.16   86.66   85.89   86.59
    4                   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    5          Gross Margin    6.77    6.88    6.96    8.79   10.31   10.16   13.25   13.84   13.34   14.11   13.41
    6                   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    7                  SG&A    5.41    5.72    6.61    6.54    7.11    7.73    7.79    8.58    7.43    8.81    9.23
    8                   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    9                   R&D       —       —       —       —       —       —       —       —       —       —       —
    10                  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    11                Other    0.26    0.15    0.15    0.16    0.12    0.12    0.13    0.07   -0.02    0.01    0.06
    12                  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    13     Operating Margin    1.11    1.00    0.20    2.09    3.08    2.31    5.33    5.18    5.93    5.30    4.12
    14                  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    15  Net Int Inc & Other   -0.22   -0.32   -0.32   -0.62   -0.22   -0.17   -0.50   -0.20    0.33    2.82    0.61
    16                  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
    17           EBT Margin    0.89    0.68   -0.12    1.47    2.86    2.14    4.84    4.98    6.26    8.12    4.73
    18                  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN]
    
  • Related