I have a dataframe and I need to update a column based on a condition (I'm trying to label text using Microsoft azure API and then save the label back to the original data frame so that later I can calculate the accuracy). But weirdly the data frame does not get updated!!
This is a sample code:
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
key = "key"
endpoint = "https://endpoint"
text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
df = pd.DataFrame({'id':[1,2,3], 'text': ['im ok', 'you arent ok', 'its fine'],
'Sentiment':['positive', 'negative', 'neutral']})
n = 10
for i in range(0, df.shape[0], n):
result = text_analytics_client.analyze_sentiment(df.iloc[i:i n].to_dict('records'))
######in case you do not have azure credentials to get this code run, the out of the result is like this:
######[AnalyzeSentimentResult(id=2, sentiment=negative, warnings= [], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), sentences=[SentenceSentiment(text=you arent ok, sentiment=negative, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), length=12, offset=0, mined_opinions=[])], is_error=False), AnalyzeSentimentResult(id=3, sentiment=positive, warnings=[], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), sentences=[SentenceSentiment(text=its fine, sentiment=positive, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), length=8, offset=0, mined_opinions=[])], is_error=False)]
for idx, doc in enumerate(result):
print(doc.sentiment) ##this will print out a value
id_res = result[idx]['id']
#print(id_res) this will print out the correct id
df.loc[df.id == id_res, 'label'] = doc.sentiment
print(df) ### but here when the dataframe is printed the label column is NAN
I have searched and find multiple links like this, this or this. In all three examples they are doing the same thing as me but my dataframe do not get updated and this is the result I get:
id text Sentiment label
0 1 im ok positive NaN
1 2 you arent ok negative NaN
2 3 its fine neutral NaN
details
Im adding some details so that it may help. As I commented in the code res_result
has a correct id. When I replace this df.loc[df.id == id_res, 'label']
with df.loc[df.id == 1, 'label']
it successfully updated that rows but otherwise it does not get updated!!!!
Appreciate any input on how to fix this.
CodePudding user response:
The issue is in this line here:
df.loc[df.id == id_res, 'label'] = doc.sentiment
df.id
is type int and id_res
is type string. If you convert id_res
to int then this will be a valid comparison and you'll get the output you're looking for:
df.loc[df.id == int(id_res), 'label'] = doc.sentiment
Output:
id text Sentiment label
0 1 im ok positive neutral
1 2 you arent ok negative negative
2 3 its fine neutral positive