How can I loop through a list of strings and extract out three different variables? The question I am trying to solve was part of a course but no solution was given.
Question: How to loop through a set of tweets and take out three different variables: "happy_tweets", "sad_tweets", and "neutral_tweets"
I am having a lot of trouble figuring out the best way to loop through the tweets and extract out each of these types of tweets.
tweets = [
"Wow, what a great day today!! #sunshine",
"I feel sad about the things going on around us. #covid19",
"I'm really excited to learn Python with @JovianML #zerotopandas",
"This is a really nice song. #linkinpark",
"The python programming language is useful for data science",
"Why do bad things happen to me?",
"Apple announces the release of the new iPhone 12. Fans are excited.",
"Spent my day with family!! #happy",
"Check out my blog post on common string operations in Python. #zerotopandas",
"Freecodecamp has great coding tutorials. #skillup"
]
happy_words = ['great', 'excited', 'happy', 'nice', 'wonderful', 'amazing', 'good', 'best']
sad_words = ['sad', 'bad', 'tragic', 'unhappy', 'worst']
happy_tweets = 0
sad_tweets = 0
neutral_tweets = 0
for tweet in tweets:
if happy_words in tweets:
happy_tweets = 1
print(happy_tweets)
CodePudding user response:
I'd recommend generating a new list of each set of tweets using set comprehensions which are like list comprehensions. Then you can take the difference of the original list of tweets with the union of the sets of happy and sad tweets. Like so:
happy_tweets = {t for t in tweets for w in happy_words if w in t}
sad_tweets = {t for t in tweets for w in sad_words if w in t}
neutral_tweets = set(tweets) - (happy_tweets | sad_tweets)
print(list(happy_tweets))
print(list(sad_tweets))
print(list(neutral_tweets))
Which gives:
['Apple announces the release of the new iPhone 12. Fans are excited.', "I'm really excited to learn Python with @JovianML #zerotopandas", 'Spent my day with family!! #happy', 'Freecodecamp has great coding tutorials. #skillup', 'Wow, what a great day today!! #sunshine', 'This is a really nice song. #linkinpark']
['I feel sad about the things going on around us. #covid19', 'Why do bad things happen to me?']
['Check out my blog post on common string operations in Python. #zerotopandas', 'The python programming language is useful for data science']
The reason for using sets here, in lieu of just lists in happy_tweets
and sad_tweets
is to prevent duplicates in the case of multiple words from the lists matching the tweet string. A slightly more efficient method, though more verbose, would be to move these out into actual for statements, and break when the first word matches. But that is not a significant change to the time complexity.
As a bit of an aside, the reason the union operation must be grouped is based on operator precedence
CodePudding user response:
Here is my code, maybe longer but I'm just trying to show you the solution in another way. Filter out all the happy and sad tweets, then every item left is the neutral one. Counting the number of tweets by their kind using len() func
tweets = [
"Wow, what a great day today!! #sunshine",
"I feel sad about the things going on around us. #covid19",
"I'm really excited to learn Python with @JovianML #zerotopandas",
"This is a really nice song. #linkinpark",
"The python programming language is useful for data science",
"Why do bad things happen to me?",
"Apple announces the release of the new iPhone 12. Fans are excited.",
"Spent my day with family!! #happy",
"Check out my blog post on common string operations in Python. #zerotopandas",
"Freecodecamp has great coding tutorials. #skillup"
]
happy_words = ['great', 'excited', 'happy', 'nice', 'wonderful', 'amazing', 'good', 'best']
sad_words = ['sad', 'bad', 'tragic', 'unhappy', 'worst']
happy_tweets_list = []
sad_tweets_list = []
neutral_tweets_list = [item for item in tweets]
for tweet in tweets:
for word in happy_words:
if word in tweet:
happy_tweets_list.append(tweet)
neutral_tweets_list.remove(tweet)
for word in sad_words:
if word in tweet:
sad_tweets_list.append(tweet)
neutral_tweets_list.remove(tweet)
happy_tweets = len(happy_tweets_list)
sad_tweets = len(sad_tweets_list)
neutral_tweets = len(neutral_tweets_list)
CodePudding user response:
Another way of finding if substrings are in a string list.
tweets = [
"Wow, what a great day today!! #sunshine",
"I feel sad about the things going on around us. #covid19",
"I'm really excited to learn Python with @JovianML #zerotopandas",
"This is a really nice song. #linkinpark",
"The python programming language is useful for data science",
"Why do bad things happen to me?",
"Apple announces the release of the new iPhone 12. Fans are excited.",
"Spent my day with family!! #happy",
"Check out my blog post on common string operations in Python. #zerotopandas",
"Freecodecamp has great coding tutorials. #skillup"
]
happy_words = ['great', 'excited', 'happy', 'nice', 'wonderful', 'amazing', 'good', 'best']
sad_words = ['sad', 'bad', 'tragic', 'unhappy', 'worst']
happy_tweets = 0
sad_tweets = 0
neutral_tweets = 0
for tweet in tweets:
if any(happy_word in tweet for happy_word in happy_words):
happy_tweets = 1
elif any(sad_word in tweet for sad_word in sad_words):
sad_tweets = 1
else:
neutral_tweets = 1
print(happy_tweets)
print(sad_tweets)
print(neutral_tweets)
Output
6
2
2