I'm trying to scrape URLs off of a web page, and they're located inside of a rankings table which takes a few seconds to load.
What I want to do is wait until the rankings table finishes loading, then grab it by its id and iterate over the elements.
This is the code I'm using to grab the page and wait:
driver = webdriver.Chrome(cred_path)
driver.get(page)
wait(driver, 5).until(EC.presence_of_element_located((By.ID, 'sc-ljMRFG hgfcNB rankings-table')))
#soup = BeautifulSoup(driver.page_source, features='lxml')
#print(soup.prettify())
rankings = soup.find_all('div', {'class': "sc-ljMRFG hgfcNB rankings-table"})[0]
print(rankings)
As far as I can tell the code is actually working up to that point (I can visually see the table loading when the window opens), but then it throws a timeout error:
Traceback (most recent call last):
File "ethereum_scraper_dappRadarv2.py", line 377, in <module>
general_dapp_page()
File "ethereum_scraper_dappRadarv2.py", line 39, in general_dapp_page
_ = wait(driver, 5).until(EC.visibility_of_element_located((By.ID, 'sc-ljMRFG hgfcNB rankings-table')))
File "/Users/trentfowler/opt/anaconda3/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
0 chromedriver 0x0000000104dd4269 __gxx_personality_v0 582729
1 chromedriver 0x0000000104d5fc33 __gxx_personality_v0 106003
2 chromedriver 0x000000010491ce28 chromedriver 171560
3 chromedriver 0x00000001049523d2 chromedriver 390098
4 chromedriver 0x0000000104952591 chromedriver 390545
5 chromedriver 0x00000001049846b4 chromedriver 595636
6 chromedriver 0x000000010496f9fd chromedriver 510461
7 chromedriver 0x0000000104982462 chromedriver 586850
8 chromedriver 0x000000010496fc23 chromedriver 511011
9 chromedriver 0x000000010494575e chromedriver 337758
10 chromedriver 0x0000000104946a95 chromedriver 342677
11 chromedriver 0x0000000104d908ab __gxx_personality_v0 305803
12 chromedriver 0x0000000104da7863 __gxx_personality_v0 399939
13 chromedriver 0x0000000104dacc7f __gxx_personality_v0 421471
14 chromedriver 0x0000000104da8bba __gxx_personality_v0 404890
15 chromedriver 0x0000000104d84e51 __gxx_personality_v0 258097
16 chromedriver 0x0000000104dc4158 __gxx_personality_v0 516920
17 chromedriver 0x0000000104dc42e1 __gxx_personality_v0 517313
18 chromedriver 0x0000000104ddb6f8 __gxx_personality_v0 612568
19 libsystem_pthread.dylib 0x00007fff205d18fc _pthread_start 224
20 libsystem_pthread.dylib 0x00007fff205cd443 thread_start 15
(Note that the subsequent rankings =
and print
statements are not executed, so far as I can tell)
My current interpretation is that selenium is executing the wait command just fine but then times out because there's no further instructions given directly to it (i.e. I'm not invoking click()
on anything).
I've RTFM, but the selenium documentation is awfully sparse. Is there really no concept of waiting until an element loads then moving on to some other processing task? Do I have to interact with the element in some way, and if so, what would be the best kind of interaction given that all I really want is to iterate over the internal elements?
CodePudding user response:
Presumably you are using a wrong locator as sc-ljMRFG hgfcNB rankings-table
can't be the value of the ID
attribute but possibly the value of class
attribute.
So effectively you need to change:
wait(driver, 5).until(EC.presence_of_element_located((By.ID, 'sc-ljMRFG hgfcNB rankings-table')))
to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CLASS_NAME:
wait(driver, 5).until(EC.visibility_of_element_located((By.CLASS_NAME, 'rankings-table')))
Using CSS_SELECTOR:
wait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.sc-ljMRFG.hgfcNB.rankings-table')))