Home > Net >  Can't get all table elements using selenium webdriver
Can't get all table elements using selenium webdriver

Time:04-30

I'm trying to get all information from this website using Python/Selenium: https://bitinfocharts.com/top-100-richest-bitcoin-addresses.html

I have successfully got all info but the problem is that it has 100 elements and I only get the first 19 (the elements who can see in Chromium window when page is loaded first time).

I tried to scroll down the page like this:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
driver.find_element_by_tag_name('body').send_keys(Keys.END)
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)

etc.. And it's working but nothing change. I'm still getting only 19 elements of 100. I tried also to change params like windows size, headless, maximized...etc...And nothing change.

chrome_driver_binary = "C:\\scraping\\selenium\\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument('--lang=en')
options.add_argument("--disable-extensions")
options.add_argument('--headless')
options.add_argument('--window-size=1920x1480')
options.binary_location = "C:\\Program Files (x86)\\BraveSoftware\\Brave-Browser\\Application\\brave.exe"

If I edit manually in the chrome window created by Selenium the main code, I can view that there is all the elements here. I can see too how window scroll down well to the bottom.

So, where is the problem?

That's the main code ho capture all the data successfully (but only 19 first elements). I put it just in case it's important.

TABLE_RESULT_BTC_TOP100 = soup1.find('table', id="tblOne").find('tbody')
for tr_tag in TABLE_RESULT_BTC_TOP100.find_all('tr'):

CodePudding user response:

The first 19 rows are in one table, then the following ones are in another table. You have to grab both tables.

Also, no need for selenium.

Here's how to get all 100 rows.

import requests
import pandas as pd

url = "https://bitinfocharts.com/top-100-richest-bitcoin-addresses.html"

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:99.0) Gecko/20100101 Firefox/99.0",
}

df = pd.read_html(requests.get(url, headers=headers).text, flavor="lxml")[2:4]
df = pd.concat(df)
print(df)

Output:

        0                                           1  ...   Outs Unnamed: 0
0     NaN                                         NaN  ...  449.0        1.0
1     NaN                                         NaN  ...   78.0        2.0
2     NaN                                         NaN  ...   77.0        3.0
3     NaN                                         NaN  ...    NaN        4.0
4     NaN                                         NaN  ...    NaN        5.0
..    ...                                         ...  ...    ...        ...
76   96.0  bc1qxv55wuzz4qsfgss3uq2zwg5y88d7qv5hg67d2d  ...    NaN        NaN
77   97.0  bc1qmjpguunz9lc7h6zf533wtjc70ync94ptnrjqmk  ...    NaN        NaN
78   98.0  bc1qyr9dsfyst3epqycghpxshfmgy8qfzadfhp8suk  ...    NaN        NaN
79   99.0  bc1q8qg2eazryu9as20k3hveuvz43thp200g7nw7qy  ...    NaN        NaN
80  100.0  bc1q4ffskt6879l4ewffrrflpykvphshl9la4q037h  ...    NaN        NaN

[100 rows x 20 columns]
  • Related