scrolldown=driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
last_count = scrolldown
time.sleep(3)
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
if last_count==scrolldown:
match=True
I want to scrape data from an Instagram profile with Selenium, but I don't know how to set the limit for scrolling the page. Because of the code above, the page keeps scrolling until I don't know when it stops. I just want to scroll through that account's posts until I find the one I'm looking for.
CodePudding user response:
As you mentioned "to scroll through that account's posts until I find the one I'm looking for" presumably the specific element should be having an unique attribute either among:
- id
- classname
- aria-label
- innerText
or can be identified uniquely within the HTML DOM with combination of it's attributes. Once you are able to construct the locator strategy which identifies the element uniquely, you can easily use scrollIntoView()
method as follows:
element = driver.find_element(By.XPATH, "//unique_xpath_locator")
driver.execute_script("return arguments[0].scrollIntoView();", element)
CodePudding user response:
Probably the best and safest way to scroll is to use
element = driver.find_element(...)
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', element)
this command scrolls smoothly in such a way that element
is vertically at the center of the page. So in your case I suggest to scroll to the oldest loaded post (it should be located at the bottom of the screen) so that new ones are loaded, and repeat the process until you find the post you are looking for. You can do this with the following code
while 1:
loaded_posts = driver.find_elements(By.CSS_SELECTOR, 'article > div > div > div > div')
# scroll to last loaded post
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', loaded_posts[-1])
post_found = ...
if post_found:
break