Home > Net >  How to click on a button on a webpage and iterate through contents after clicking on button using py
How to click on a button on a webpage and iterate through contents after clicking on button using py

Time:09-15

I am using Python Selenium to web scrape from https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL but I want to scrape the Quarterly data instead of the Annual after clicking on the "Quarterly" button on the top right. This is my code so far:

def readQuarterlyBSData(ticker):
    url = 'https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL'
    options = Options()
    options.add_argument('--headless')
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(url)
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click()
    soup = BeautifulSoup(driver.page_source, 'lxml')
    ls= [] 
    # Trying to iterate through each div after clicking on the Quarterly button but content is still Annual Data
    for element in soup.find_all('div'): 
       ls.append(element.string) # add each element one by one to the list

I am able to get the button to click but when I iterate through the divs, I am still getting content that is from Annual data and not Quarterly data. Can someone show me how I can iterate through Quarterly data?

CodePudding user response:

soup = BeautifulSoup(driver.page_source, 'lxml')

You don't need to pass your driver.page_source to BS4, use Selenium itself to extract the data using driver.find_element function.

Here is the doc on that: https://selenium-python.readthedocs.io/locating-elements.html

Also, you are not waiting for the page source to be updated, so add a time delay after the click. You are just waiting for the button to appear, what happens after that? You immediately pass the page source that hasn't been updated after the click. So wait,

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click()
time.sleep(10) # wait and see
soup = BeautifulSoup(driver.page_source, 'lxml')

Hope it helps :)

  • Related