How to handle errors from IBM Watson when iterating over rows-CodePudding

I am a student working on a project using IBM Watson's NLU to parse through various News articles and return a sentiment score. I have the articles in a table, and I have a loop set up to go through each cell in the first column, analyze it, normalize it, and append the new data to the table.

masterdf = pd.DataFrame()

for index, row in df.iterrows():
    text2 = row['CONTENT']
    response = natural_language_understanding.analyze(
        text = text2,
        features=Features(sentiment=SentimentOptions(targets=["Irish",]))).get_result()
    json_tbl = pd.json_normalize(response['sentiment'], 
                       record_path='targets',
                       meta=[['document','score'], ['document','label']])
    json_tbl = json_tbl.set_index([pd.Index([index])])
    print(json_tbl.head())
    masterdf = masterdf.append(json_tbl)

masterdf = pd.concat([df, masterdf], axis=1)
masterdf.head()

The issue that I am having is that sometimes the entity that I am targeting isn't in the article I am analyzing, and so IBM throws an error. This completely breaks my code. What I would like to do is that whenever IBM returns an error, my code just fills in the row with "N/A" and progresses to the next cell below it. I am really a beginner so any help would be really really appreciated.

CodePudding user response：

I would recommend creating a separate function to encapsulate all that sentiment analysis logic. In the end, you would call it like this:

df['SENTIMENT_SCORE'] = df['CONTENT'].apply(safe_complex_function)

safe_complex_funtion would be your brand new safe function. Give it the name you want. It would be probably something like this:

def sentiment_scores(content):
    try:
        response = natural_language_understanding.analyze(
            text=content,
            features=Features(
                sentiment=SentimentOptions(targets=["Irish",])
            )
        ).get_result()
        json_tbl = pd.json_normalize(
            response['sentiment'], 
            record_path='targets',
            meta=[['document','score'], ['document','label']]
        )
        return json_tbl.set_index([pd.Index([index])])
    except <The specific Exception you want to deal>: # please don't put Exception. It is too general
        return None

Here is an example code:

Creating a test Dataframe

import pandas as pd

data = [
    (1, 'I am happy'),
    (2, 'I am sad'),
    (3, 'I am neutral'),
    (4, 'Exception generator')
]

df = pd.DataFrame(data,columns=['USER_ID','CONTENT'])

	USER_ID	CONTENT
0	1	I am happy
1	2	I am sad
2	3	I am neutral
3	4	Exception generator

Creating a mocking sentiment analysis function

This function is solely for mocking.

def fake_sentiment_analysis(content):
    sentiment_scores = {
        'sad': -1,
        'happy': 1,
        'neutral': 0
    }
    for sentiment, score in sentiment_scores.items():
        if sentiment in content:
            return score
    ## rasises KeyError error only for demonstration purposes
    return sentiment_scores['BROKEN']

def complex_function(element):
    sentiment_score = fake_sentiment_analysis(element)
    return sentiment_score

Applying that non-safe function on DataFrame

You would got KeyError calling that function

df['CONTENT'].apply(complex_function)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-22-d418b58879b8> in <module>()
----> 1 df['CONTENT'].apply(complex_function)

2 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-12-538de52b436a> in fake_sentiment_analysis(content)
      9             return score
     10     ## rasises KeyError error only for demonstration purposes
---> 11     return sentiment_scores['BROKEN']

KeyError: 'BROKEN'

Adding Exception handler

You can make it safer adding exception handling

def safe_complex_function(element):
    try:
        sentiment_score = fake_sentiment_analysis(element)
    except KeyError:
        sentiment_score = None
    return sentiment_score

	USER_ID	CONTENT	SENTIMENT_SCORE
0	1	I am happy	1
1	2	I am sad	-1
2	3	I am neutral	0
3	4	Exception generator	nan