I wanted to scroll down web page using selenium. Found this:
How can I scroll a web page using selenium webdriver in python?
Took this code bellow:
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
It works fine. But i have found some issue in my main code because of code above. I want to parse twitter. If twitter account is long, in html code of web page there are a few twits. Not all twits of this account. Example: i scroll down web page, and in html code of web page contains only those twits which are visiable for me(which i can see). Due to this thing i can't catch all the twits. This code above scrolls page fastly. How can i slow down scrolling?
I tried to solve it and wrote dumb code:
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)
# Scroll down to bottom
y = 600
finished = False
while True:
for timer in range(0, 100):
driver.execute_script("window.scrollTo(0, " str(y) ")")
y = 600
sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
print(new_height, last_height)
if new_height == last_height: #on the first iteration new_height equals last_height
print('stop')
finished = True
break
last_height = new_height
if finished:
break
This code doesn't work. On the first iteration new_height equals to last_height Please, help me.
If you can fix my code, fix it. If you can write another elegant solution, write it please.
UPD:
This scrolling has to be infinity. For example: i scroll down facebook account 'till i scroll it fully. That's why i have last_height and new_height variables. In my code when last_height equals to new_height that's mean page has been scrolled to the end and we can stop scrolling it(we can exit). But i missed something. My code doesn't work.
CodePudding user response:
I have worked on the Twitter bot, when you scroll down it updates the page's HTML and removes some tweets from above. The algorithm I used is:
- create an empty list for tweet URLs.
- Collect available tweets and then for each tweet check if its URL is in the list, if not then add it and do the process on tweet's content what you want otherwise ignore that tweet.
- get the height of page
current_height = DriverWrapper.cd.execute_script("return document.body.scrollHeight")
- Scroll down the page and if
new_height == current_height
end otherwise repeat from 2nd step..