Home > Software engineering >  using pd.read_html to read current page
using pd.read_html to read current page

Time:10-28

I'm trying to use pd.read_html() to read the current page I'm trying to scrape using Selenium.

The only problem is the web page does not contain a table until you press a few buttons using selenium button click and then the table is displayed.

So when I input an argument:

pd.read_html('html_string')

It gives me an error.

Is there a way to read in the current page after the buttons have been clicked and not just putting in the html string as an argument?

I've also looked at the documentation for this and could not find anything to help.

Thanks for reading/answering

CodePudding user response:

I would try to pass a page source instead of an address when the source is updated:

url = ...
button_id = ...
driver.get(url)
button = driver.find_element(by=button_id)
button.click()
...  # wait?

data = pd.read_html(driver.page_source)
  • Related