It seems Twitter made lots of changes to API and Webapp so many methods that used to work in the past do not work anymore.
Which method does work as of today (April 2022) to scrape historical tweets from specific users at least the last 4 years?
- Selenium scraping: open https://www.twitter.com/ and scroll down until you have the history you need
- not working, only 1 month data
- https://pypi.org/project/GetOldTweets3/ - Twitter scraping of older tweets
- seems to have stopped working? https://github.com/Mottl/GetOldTweets3/issues/98
- Twitter API v2, Academic Research, full archive (free): https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api
- ?
- Twitter Premium API 1.1 full archive (paid): https://developer.twitter.com/en/docs/twitter-api/premium/search-api/overview
- ?
CodePudding user response:
There is a python based library named twint. It's an un-official sdk for twitter public API. you won't require twitter developer account or any access token.
Twint
can help you in getting previous tweets and with some advanced controls.
If you feel some rate restriction from twitter for a large number of tweets, try breaking your search into smaller chunks.
Like, 10-10-2021
to 17-10-2021
(bi-weekly)
I have a sample github repo for that. Link
Hopefully, this can help!