Home > Back-end >  Timespan for Elevated Access to Historical Twitter Data
Timespan for Elevated Access to Historical Twitter Data

Time:02-22

I have a developer account as an academic and my profile page on twitter has Elevated on top of it, but when I use Tweepy to access the tweets, it only scrapes tweets from 7 days ago. How can I extend my access up to 2006?

This is my code:

import tweepy
from tweepy import OAuthHandler
import pandas as pd


access_token = '#'
access_token_secret = '#'
API_key = '#'
API_key_secret = '#'

auth = tweepy.OAuthHandler(API_key, API_key_secret)
auth.set_access_token(access_token, access_token_secret)


api = tweepy.API(auth, wait_on_rate_limit=True)

tweets = []

count = 1


for tweet in tweepy.Cursor(api.search_tweets, q= "#SEARCHQUERY", count=5000).items(50000):
    
    print(count)
    count  = 1

    try: 
        data = [tweet.created_at, tweet.id, tweet.text,
tweet.user._json['screen_name'], tweet.user._json['name'], tweet.user._json['created_at'], tweet.entities['urls']]
        data = tuple(data)
        tweets.append(data)
        

    except tweepy.TweepError as e:
        print(e.reason)
        continue

    except StopIteration:
        break

df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', 'screen_name', 'name', 'account_creation_date', 'urls'])

df.to_csv(path_or_buf = 'local address/file.csv', index=False)

CodePudding user response:

tweepy does not support retrieving tweets that are older than one week (see source).

This page advertising version 2 of the Twitter API makes note of a GET /2/tweets/search/all endpoint that is only available to academic researchers.

You should use the requests module to make GET requests to https://api.twitter.com/2/tweets/search/all, and use .json() to parse the resulting response.

import requests

url = 'https://api.twitter.com/2/tweets/search/all'
headers = {'Authorization': '<authentication information>'}
params = {'query': '<query text>'}

r = requests.get(url, headers=headers, params=params)
print(r.json())

I don't have academic research access, so I can't directly test this, but this code does work when sending requests to the GET /2/tweets/search/recent endpoint.


For those who don't have academic research access, some other solutions involving third-party tools are discussed here.

CodePudding user response:

The Search All endpoint is available in Twitter API v2, which is represented by the tweepy.Client object (you are using tweepy.api).

The most important thing is that you require Academic research access from Twitter. Elevated access grants addition request volume, and access to the v1.1 APIs on top of v2 (Essential) access, but you will need an account and Project with Academic access to call the endpoint. There's a process to apply for that in the Twitter Developer Portal.

  • Related