I am a student working on a project using IBM Watson's NLU to parse through various News articles and return a sentiment score. I have the articles in a table, and I have a loop set up to go through each cell in the first column, analyze it, normalize it, and append the new data to the table.
masterdf = pd.DataFrame()
for index, row in df.iterrows():
text2 = row['CONTENT']
response = natural_language_understanding.analyze(
text = text2,
features=Features(sentiment=SentimentOptions(targets=["Irish",]))).get_result()
json_tbl = pd.json_normalize(response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']])
json_tbl = json_tbl.set_index([pd.Index([index])])
print(json_tbl.head())
masterdf = masterdf.append(json_tbl)
masterdf = pd.concat([df, masterdf], axis=1)
masterdf.head()
The issue that I am having is that sometimes the entity that I am targeting isn't in the article I am analyzing, and so IBM throws an error. This completely breaks my code. What I would like to do is that whenever IBM returns an error, my code just fills in the row with "N/A" and progresses to the next cell below it. I am really a beginner so any help would be really really appreciated.
CodePudding user response:
I would recommend creating a separate function to encapsulate all that sentiment analysis logic. In the end, you would call it like this:
df['SENTIMENT_SCORE'] = df['CONTENT'].apply(safe_complex_function)
safe_complex_funtion
would be your brand new safe function. Give it the name you want. It would be probably something like this:
def sentiment_scores(content):
try:
response = natural_language_understanding.analyze(
text=content,
features=Features(
sentiment=SentimentOptions(targets=["Irish",])
)
).get_result()
json_tbl = pd.json_normalize(
response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']]
)
return json_tbl.set_index([pd.Index([index])])
except <The specific Exception you want to deal>: # please don't put Exception. It is too general
return None
Here is an example code:
Creating a test Dataframe
import pandas as pd
data = [
(1, 'I am happy'),
(2, 'I am sad'),
(3, 'I am neutral'),
(4, 'Exception generator')
]
df = pd.DataFrame(data,columns=['USER_ID','CONTENT'])
USER_ID | CONTENT | |
---|---|---|
0 | 1 | I am happy |
1 | 2 | I am sad |
2 | 3 | I am neutral |
3 | 4 | Exception generator |
Creating a mocking sentiment analysis function
This function is solely for mocking.
def fake_sentiment_analysis(content):
sentiment_scores = {
'sad': -1,
'happy': 1,
'neutral': 0
}
for sentiment, score in sentiment_scores.items():
if sentiment in content:
return score
## rasises KeyError error only for demonstration purposes
return sentiment_scores['BROKEN']
def complex_function(element):
sentiment_score = fake_sentiment_analysis(element)
return sentiment_score
Applying that non-safe function on DataFrame
You would got KeyError
calling that function
df['CONTENT'].apply(complex_function)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-22-d418b58879b8> in <module>()
----> 1 df['CONTENT'].apply(complex_function)
2 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-12-538de52b436a> in fake_sentiment_analysis(content)
9 return score
10 ## rasises KeyError error only for demonstration purposes
---> 11 return sentiment_scores['BROKEN']
KeyError: 'BROKEN'
Adding Exception handler
You can make it safer adding exception handling
def safe_complex_function(element):
try:
sentiment_score = fake_sentiment_analysis(element)
except KeyError:
sentiment_score = None
return sentiment_score
USER_ID | CONTENT | SENTIMENT_SCORE | |
---|---|---|---|
0 | 1 | I am happy | 1 |
1 | 2 | I am sad | -1 |
2 | 3 | I am neutral | 0 |
3 | 4 | Exception generator | nan |