The pandas dataframe do not get updated based on a condition-CodePudding

I have a dataframe and I need to update a column based on a condition (I'm trying to label text using Microsoft azure API and then save the label back to the original data frame so that later I can calculate the accuracy). But weirdly the data frame does not get updated!!

This is a sample code:

from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient


key = "key"
endpoint = "https://endpoint"

text_analytics_client = TextAnalyticsClient(endpoint=endpoint,   credential=AzureKeyCredential(key))

df = pd.DataFrame({'id':[1,2,3], 'text': ['im ok', 'you arent ok', 'its fine'],
                   'Sentiment':['positive', 'negative', 'neutral']})
n = 10

for i in range(0, df.shape[0], n):
    result = text_analytics_client.analyze_sentiment(df.iloc[i:i   n].to_dict('records'))
######in case you do not have azure credentials to get this code run, the out of the result is like this:
######[AnalyzeSentimentResult(id=2, sentiment=negative, warnings= [], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), sentences=[SentenceSentiment(text=you arent ok, sentiment=negative, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), length=12, offset=0, mined_opinions=[])], is_error=False), AnalyzeSentimentResult(id=3, sentiment=positive, warnings=[], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), sentences=[SentenceSentiment(text=its fine, sentiment=positive, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), length=8, offset=0, mined_opinions=[])], is_error=False)]

    for idx, doc in enumerate(result):
        print(doc.sentiment) ##this will print out a value
        id_res = result[idx]['id']
        #print(id_res) this will print out the correct id
        df.loc[df.id == id_res, 'label'] = doc.sentiment
        print(df) ### but here when the dataframe is printed the label column is NAN

I have searched and find multiple links like this, this or this. In all three examples they are doing the same thing as me but my dataframe do not get updated and this is the result I get:

   id          text   Sentiment label
0   1         im ok  positive   NaN
1   2  you arent ok  negative   NaN
2   3      its fine   neutral   NaN

details

Im adding some details so that it may help. As I commented in the code res_result has a correct id. When I replace this df.loc[df.id == id_res, 'label'] with df.loc[df.id == 1, 'label'] it successfully updated that rows but otherwise it does not get updated!!!!

Appreciate any input on how to fix this.

CodePudding user response：

The issue is in this line here:

df.loc[df.id == id_res, 'label'] = doc.sentiment

df.id is type int and id_res is type string. If you convert id_res to int then this will be a valid comparison and you'll get the output you're looking for:

df.loc[df.id == int(id_res), 'label'] = doc.sentiment

Output:

   id          text Sentiment     label
0   1         im ok  positive   neutral
1   2  you arent ok  negative  negative
2   3      its fine   neutral  positive