I want to crawl Twitter by keyword with Twitter API. Im using Twitter Search API.
query = 'football'
tweet_fields = "author_id,created_at,text,public_metrics,possibly_sensitive,source,lang"
max_results = "50"
#define search twitter function
headers = {"Authorization": "Bearer {}".format(BEARER_TOKEN)}
url = "https://api.twitter.com/2/tweets/search/recent?query={}&tweet.fields={}&max_results={}".format(query, tweet_fields, max_results)
response = requests.request("GET", url, headers=headers)
status_code = response.status_code
print("Response Status Code:", status_code)
if response.status_code != 200:
raise Exception(response.status_code, response.text)
else:
pass
#print(response.json())
twitter_search_data = response.json()['data']
twitter_response = []
for data in twitter_search_data:
print(data)
Im getting good results, but I want to get author_username
also.
For now I can only get author_id
I have tried to add this to my API link but I do not get those results:
expansions=author_id&user.fields={}
user_fields = "description,username"
url = "https://api.twitter.com/2/tweets/search/recent?query={}&tweet.fields={}&expansions=author_id&user.fields={}&max_results={}".format(query, tweet_fields, user_fields, max_results)
This is example result:
{'possibly_sensitive': False, 'source': 'Twitter for Android', 'lang': 'en', 'public_metrics': {'retweet_count': 1, 'reply_count': 0, 'like_count': 0, 'quote_count': 0}, 'created_at': '2021-10-05T12:23:05.000Z', 'id': '1445363916457005058', 'text': 'RT @COiNSTANTIN1: @MEXC_Global @PolkaExOfficial Check out @MiniFootballBsc We are bringing together the football and crypto community.\n⚽️Fa…', 'author_id': '1444275133854715912'}
Is there a way to add something to my Twitter API so that I can get: 1.author username 2.author name 3.number of followers for author 4.number of followings for author
CodePudding user response:
You're close to what you need in your code, but the user information you're requesting via the expansions, is actually being delivered in a second array called includes
; and you're missing that, because your code is only printing each value in the data
array.
If you want the metrics (number of followers / followings for each user) you will want to add an additional user field to your query:
user_fields = "description,username,public_metrics"
Then, you can either list out the includes
separately, or do some matching to combine the user object with the matching Tweet. The simplest thing to do would be:
print(response.json()['data'])
print(response.json()['includes'])
You can match the user with the Tweet data by checking the author_id
in the Tweet object against the id
value in the user object.
There are also tools and libraries that can help you do this automatically, for example, the latest version of twarc
can "flatten" this data into single objects.