Home > Back-end >  How to scrape html code of webpage after each click of a button with Selenium and python
How to scrape html code of webpage after each click of a button with Selenium and python

Time:08-31

I am new to Selenium and I am having hard time. I have 3 buttons in total, and after each click I want to get entire html code of page (I am familiar with BeautifoulSoup so I will do filtrations I need with it).

This is part of html code of webpage I want to scrape:

<ul >
   <li >
      <a href="#">Banana</a> --button1
   </li>
   <li >
      <a href="#">Apple</a> --button2
   </li> 
   <li >
      <a href="#">Orange</a> --button3
   </li>
</ul>

I tried this:

options = Options()
options.headless = True
options.binary_location = r'/bin/firefox'

driver = webdriver.Firefox(options=options)

driver.set_window_size(2500, 2500)

driver.get('https://blabla.bla')

all_htmls = []

driver.implicitly_wait(0.8)
driver.find_element(By.XPATH, '//*[@id="vue-container"]/section/div/header/ul/li[1]').click()
all_htmls.append(driver.page_source)

driver.implicitly_wait(0.8)
driver.find_element(By.XPATH, '//*[@id="vue-container"]/section/div/header/ul/li[2]').click()
all_htmls.append(driver.page_source)                        

driver.implicitly_wait(0.8)
driver.find_element(By.XPATH, '//*[@id="vue-container"]/section/div/header/ul/li[3]').click()
all_htmls.append(driver.page_source)

But no luck. It acts strange, sometimes all_htmls only has 1 element, sometimes 2, but never 3.

CodePudding user response:

would try finding element by text as it has the clearest difference. finding element by text: https://www.browserstack.com/guide/find-element-by-text-using-selenium

also if the buttons take you to a different adress maybe you should return to your initial page

CodePudding user response:

I found solution. You need to use Waits because page needs time to load all html elements so you can click on them.

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/main/div[3]/section/div/header/ul/li[1]/a".format(num_of_btn))))
  • Related