list Comprehension - comparing one list to another list with indexing-CodePudding

I have a list that comprises a list of tweets:

twitter_dataset_list = [['322185112684994561', '@Bill_Porter nice to know that your site is back :-)'], ['322185112684994545', 'I had a bad day']]

I want to compare each of those elements' messages to see if they are positive or negative with the following list

positive_keyword_list = ['nice']

negative_keyword_list = ['bad']

if they are positive / negative then I want to append a flag to each of the initial list, like this:

[['322185112684994561', '@Bill_Porter nice to know that your site is back :-)', 1], ['322185112684994545', 'I had a bad day', -1]]

Ive done this, but I'm not sure how to iterate and sub-index

for element in twitter_dataset_list:
    if any(word in twitter_dataset_list[0][1] for word in positive_keyword_list) == True:
        twitter_dataset_list.append('1')
    elif any(word in twitter_dataset_list[0][1] for word in negative_keyword_list) == True:
        twitter_dataset_list.append('-1')
    else:
        twitter_dataset_list[0][1].append('0')

print(twitter_dataset_list)

So how do I iterate over the twitter_dataset_list

CodePudding user response：

Each element in your twitter_dataset_list is a list of strings, so you have to iterate over it to check for the words in the positive and negative lists. Then you could use the index of the sublists to append flags in-place:

for i, element in enumerate(twitter_dataset_list):
    if any(word in s for s in element for word in positive_keyword_list):
        twitter_dataset_list[i].append('1')
    elif any(word in s for s in element for word in negative_keyword_list):
        twitter_dataset_list[i].append('-1')
    else:
        twitter_dataset_list[i].append('0')

print(twitter_dataset_list)

Output:

[['322185112684994561', '@Bill_Porter nice to know that your site is back :-)', '1'], 
 ['322185112684994545', 'I had a bad day', '-1']]

CodePudding user response：

First, the enumerate function is useful here, because it'll give you both the index and the values as you iterate over the list.

Second, you can unpack as you go using the for i, (id, text) in syntax.

And finally, you can use _ for any unpacking that you don't actually use in the loop. (Here, I don't need the ID, so I just put _ to tell python not to worry about it.)

More details on the different ways you can unpack things in loops is available from the Python docs' on data structures.

for i, (_, text) in enumerate(twitter_dataset_list):
    if any(word in text for word in positive_keyword_list):
        twitter_dataset_list[i].append(1)
    elif any(word in text for word in negative_keyword_list):
        twitter_dataset_list[i].append(-1)
    else:
        twitter_dataset_list[i].append(0)

CodePudding user response：

I recommend not altering the original data, and instead returning a new list:

positive_keyword_set = {"nice",}
negative_keyword_set = {"bad",}

tweets_with_sentiments = []
for tweet_id, tweet in twitter_dataset_list:
    sentiment = 0
    words = tweet.lower().split()
    if negative_keyword_set.intersection(words):
        sentiment = -1
    elif positive_keyword_set.intersection(words):
        sentiment = 1
    tweets_with_sentiments.append([tweet_id, tweet, sentiment])

Note that I've also converted your keyword lists to set. This allows for O(1) lookups since the values stored in the set can be hashed. It also allows you to simply use set.intersection() on the tweet's words to find the keywords:

>>> tweet = '@Bill_Porter nice to know that your site is back :-)'

>>> tweet.lower().split()
['@bill_porter',
 'nice',
 'to',
 'know',
 'that',
 'your',
 'site',
 'is',
 'back',
 ':-)']

>>> positive_keyword_set.intersection(tweet.split())
{'nice'}

In fact, I'd suggest going so far as using a dict to store tweet sentiment:

positive_keyword_set = {"nice",}
negative_keyword_set = {"bad",}

tweets_with_sentiments = {}
for tweet_id, tweet in twitter_dataset_list:
    sentiment = 0
    if negative_keyword_set.intersection(tweet.split()):
        sentiment = -1
    elif positive_keyword_set.intersection(tweet.split()):
        sentiment = 1
    tweets_with_sentiments[int(tweet_id)] = dict(tweet=tweet, sentiment=sentiment)

Now your data structure can be accessed in O(1) by the tweet ID:

>>> tweets_with_sentiments
{322185112684994561: {'tweet': '@Bill_Porter nice to know that your site is back :-)', 'sentiment': 1},
 322185112684994545: {'tweet': 'I had a bad day', 'sentiment': -1}}

>>> tweets_with_sentiments[322185112684994561]["sentiment"]
1

CodePudding user response：

I would create a function to handle the sentiment, as I assume this part of code may evolve (I use Blobtext lib for a similar app):

twitter_dataset_list = [['322185112684994561', '@Bill_Porter nice to know that your site is back :-)'],
                        ['322185112684994545', 'I had a bad day']]

def text_positivity(tweet_text:str)->list:
    # https://www.adamsmith.haus/python/answers/how-to-check-if-a-string-contains-an-element-from-a-list-in-python#:~:text=Use any() to check,to build the generator expression.
    positive_keyword_list = ['nice']
    negative_keyword_list = ['bad']
    if any(keyword in tweet_text for keyword in positive_keyword_list):
        return [1]
    if any(keyword in tweet_text for keyword in negative_keyword_list):
        return [-1]
    return [0]

twitter_dataset_list = [tweet_details   text_positivity(tweet_text=tweet_details[1]) for tweet_details in twitter_dataset_list]
print(twitter_dataset_list)