I have a class Tweet that contains several tweets. Then there's a list that contains all the tweets. These tweets also have users, amount of retweets and age, which is not relevant to my question. Only content matters.
tweet1 = Tweet("@realDonaldTrump", "Despite the negative press covfefe #bigsmart", 1249, 54303)
tweet2 = Tweet("@elonmusk", "Technically, alcohol is a solution #bigsmart", 366.4, 166500)
tweet3 = Tweet("@CIA", "We can neither confirm nor deny that this is our first tweet. #heart", 2192, 284200)
tweets = [tweet1, tweet2, tweet3]
I need to get a list of all the hashtags, but I only get the one from the 1st tweet with my code.
for x in tweets:
return re.findall(r'#\w ', x.content)
CodePudding user response:
You are returning after the first iteration of the loop. You need to go through all tweets and add the hastags to a list:
def get_hashtags(tweets):
result = []
for x in tweets:
result.extend(re.findall(r'#\w ', x.content))
return result
For sorting, you can use a defaultdict to add up the reweets. Then, sort by the count.
from collections import defaultdict
def get_hashtags_sorted(tweets):
result = defaultdict(int)
for x in tweets:
for hashtag in re.findall(r'#\w ', x.content):
result[hashtag] = x.retweets
sorted_hashtags = sorted(tweets.items(), key=lambda x: x[1])
return list(sorted_hashtags)