I'm using snscrape.modules.twitter.TwitterSearchScraper()
function to scrape tweets for a specific location and time interval. The code is the following one:
loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
if i==100:
break
tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])
My question is if there is a way to get only one tweet per user, because by running the above code some users are repeated.
Thanks in advanced!
CodePudding user response:
You could check if the tweet.user.id
exists before adding it to your list.
Here, I added a new list (called tweets_user_ids
) for store the values from tweet.user.id
and add the tweet in the tweets_list
list variable if the tweet.user.id
does not exists on the new list.
Code:
import snscrape
import snscrape.modules.twitter as sntwitter
loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []
max_amount_of_tweets = 100
tweets_user_ids = [] # Lists of tweets user ids - this is for check and avoid duplicates.
i = 0 # I suppose this is an incremental value.
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
# Add the ids to a separate list:
if (len(tweets_user_ids) == 0):
tweets_user_ids.append(tweet.user.id)
# Check if the id is not already added, then, add the data:
if (tweet.user.id not in tweets_user_ids):
tweets_user_ids.append(tweet.user.id)
tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])
i =1 # Increment.
# Break the loop when the max amount of tweets is reached.
if (i == max_amount_of_tweets):
break
print(tweets_list)