I am trying to scrape tweets under a hashtag using Python selinum and I use the following code to scroll down
driver.execute_script('window.scrollTo(0,document.body.scrollHeight);')
The problem is that selinum only scrapes shown tweets (only 3 tweets) and then scroll down to the end of the page and load more tweets and scrape 3 new tweets missing a lot of tweets in between.
Is there a way to show all tweets and then scroll down and show all new tweets or at least some new tweets (I've a mechasm to filter already scraped rweets) ?
Note I'm running my script on GCP VM so I can't rotate the screen.
I think that I can make the script keeps pressing the down arrow by that I can display tweets one by one and scrape them and also keep loading more tweets, but I think that this will slow down the scraper so much.
CodePudding user response:
Scroll down the page by pixels, so the page will get the time to load the data, try the below code:
while True:
driver.execute_script("window.scrollBy(0, 800);") # you can increase or decrease the scrolling height, i.e - '800'
sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
CodePudding user response:
To scroll down page in selenium we need to write
driver.execute_script(
"window.scrollTo(" str(data.location["x"]) ", " str(data.location["y"]) ")")
Here data is the tweets that we get