I'm trying to scrape data from Instagram with Selenium, but I need my program to fully load the page so it can actually access the data.
I'm looping through a list with urls and want to target the posts and followers numbers. So it looks something like:
users = []
for i in urls:
driver.get(i)
sleep(1) #even with this sleep, it doesn't alway load enough of the page and I'd prefer not to sleep too long on every page
header = driver.find_elements(By.CLASS_NAME, "_aa_6")
number = header[0].text
number = int(number.replace(f"\nposts",""))
if number >= 10:
followers = header[1].text
tup = (url, followers)
users.append(tup)
It works sometimes with the 1 second sleep but it's hit or miss. I was wondering if Selenium has some way to enforce the page to load. However, I wouldn't want to fully load each page either since it doesn't have to load the instagram pictures.
I'll do a while-loop to enforce the length of header but I was wondering if Selenium offers a better solution or maybe Selenium isn't the best tool for this kind of task?
CodePudding user response:
You just have to use explicit waits instead of the so worse time.sleep()
.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
header = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, "_aa_6")))
[...]
It will wait until the element is located (aka loaded). I set it waits for a max of 30 seconds but of course it can be modified.