Up until this morning I successfully used snscrape to scrape twitter tweets via python. The code looks like this:
import snscrape.modules.twitter as sntwitter
query = "from:annewilltalk"
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
# do stuff
But this morning without any changes I got the error:
Traceback (most recent call last):
File "twitter.py", line 568, in <module>
Scraper = TwitterScraper()
File "twitter.py", line 66, in __init__
self.get_tweets_talkshow(username = "annewilltalk")
File "twitter.py", line 271, in get_tweets_talkshow
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
File "/home/pi/.local/lib/python3.8/site-packages/snscrape/modules/twitter.py", line 1455, in get_items
for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', _TwitterAPIType.V2, params, paginationParams, cursor = self._cursor):
File "/home/pi/.local/lib/python3.8/site-packages/snscrape/modules/twitter.py", line 721, in _iter_api_data
obj = self._get_api_data(endpoint, apiType, reqParams)
File "/home/pi/.local/lib/python3.8/site-packages/snscrape/modules/twitter.py", line 691, in _get_api_data
r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response)
File "/home/pi/.local/lib/python3.8/site-packages/snscrape/base.py", line 221, in _get
return self._request('GET', *args, **kwargs)
File "/home/pi/.local/lib/python3.8/site-packages/snscrape/base.py", line 217, in _request
raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=from:annewilltalk&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&ext=mediaStats,highlightedLabel,hasNftAvatar,voiceInfo,enrichments,superFollowMetadata,unmentionInfo failed, giving up.
I found online, that the URL encoding must not exceed a length of 500 and the one from the error message is about 800 long. Could that be the problem? Why did that change overnight? How can I fix that?
CodePudding user response:
the same problem here. It was working just fine and sudenly stoped. I get the same error code. This is the code I used:
import snscrape.modules.twitter as sntwitter
import pandas as pd
# Creating list to append tweet data to
tweets_list2 = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('xxxx since:2022-06-01 until:2022-06-30').get_items()):
if i>100000:
break
tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])
print(tweets_df2)
tweets_df2.to_csv('xxxxx.csv')
CodePudding user response:
I am having the same problem and found a similar question here with a possible solution and explanation. But it is yet unclear if it completely works.
good luck!