Home > Net >  Tweets scraping using Python selinum
Tweets scraping using Python selinum

Time:12-01

I am trying to scrape tweets under a hashtag using Python selinum and I use the following code to scroll down driver.execute_script('window.scrollTo(0,document.body.scrollHeight);')

The problem is that selinum only scrapes shown tweets (only 3 tweets) and then scroll down to the end of the page and load more tweets and scrape 3 new tweets missing a lot of tweets in between.

Is there a way to show all tweets and then scroll down and show all new tweets or at least some new tweets (I've a mechasm to filter already scraped rweets) ?

Note I'm running my script on GCP VM so I can't rotate the screen.

I think that I can make the script keeps pressing the down arrow by that I can display tweets one by one and scrape them and also keep loading more tweets, but I think that this will slow down the scraper so much.

CodePudding user response:

Scroll down the page by pixels, so the page will get the time to load the data, try the below code:

while True:
    driver.execute_script("window.scrollBy(0, 800);")  # you can increase or decrease the scrolling height, i.e - '800'
    sleep(1)
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

CodePudding user response:

To scroll down page in selenium we need to write

driver.execute_script(
        "window.scrollTo("   str(data.location["x"])   ", "   str(data.location["y"])   ")")

Here data is the tweets that we get

  • Related