I am really a beginner with python, but I am trying to use IBM's sentiment analyzer to make a dataset. I get a JSON response which I want to put into a table. So far what I have is:
response = natural_language_understanding.analyze(
text = df_text,
features=Features(sentiment=SentimentOptions(targets=['Pericles']))).get_result()
print(json.dumps(response, indent=2))
respj = json.dumps(response['sentiment'])
respj
which prints
'{"targets": [{"text": "Pericles", "score": -0.939436, "label": "negative"}], "document": {"score": -0.903556, "label": "negative"}}'
Now it is at this point that I would really like to make a pandas table with this data. Ideally, I would like all the above information formated like -> Text | text score | Document score
I don't really need the label positive or negative but it doesn't hurt to have it. How would I accomplish this? Right now when I try
json_df = pd.read_json(respj)
json_df.head()
I get
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-b06d8a1caf3f> in <module>
----> 1 json_df = pd.read_json(respj)
2 json_df.head()
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
212 else:
213 kwargs[new_arg_name] = new_arg_value
--> 214 return func(*args, **kwargs)
215
216 return cast(F, wrapper)
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
606 return json_reader
607
--> 608 result = json_reader.read()
609 if should_close:
610 filepath_or_buffer.close()
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read(self)
729 obj = self._get_object_parser(self._combine_lines(data.split("\n")))
730 else:
--> 731 obj = self._get_object_parser(self.data)
732 self.close()
733 return obj
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
751 obj = None
752 if typ == "frame":
--> 753 obj = FrameParser(json, **kwargs).parse()
754
755 if typ == "series" or obj is None:
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in parse(self)
855
856 else:
--> 857 self._parse_no_numpy()
858
859 if self.obj is None:
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1086
1087 if orient == "columns":
-> 1088 self.obj = DataFrame(
1089 loads(json, precise_float=self.precise_float), dtype=None
1090 )
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
433 )
434 elif isinstance(data, dict):
--> 435 mgr = init_dict(data, index, columns, dtype=dtype)
436 elif isinstance(data, ma.MaskedArray):
437 import numpy.ma.mrecords as mrecords
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
252 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
253 ]
--> 254 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
255
256
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
62 # figure out the index, if necessary
63 if index is None:
---> 64 index = extract_index(arrays)
65 else:
66 index = ensure_index(index)
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in extract_index(data)
366
367 if have_dicts:
--> 368 raise ValueError(
369 "Mixing dicts with non-Series may lead to ambiguous ordering."
370 )
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering
If anyone can give me some tips as to how to make the table I am trying to make I would really appreciate it. Also if anyone can explain the error I have right now that would be really great too. I think I get the basic premise that it's because the JSON has two incompatible "tables" in it already. Thank you for any help.
CodePudding user response:
You don't need to dump the response['sentiment']
as a JSON string if you just want to turn it into a DataFrame. Use pandas.json_normalize
instead.
It seems that response['sentiment']
looks something like
>>> response['sentiment']
{
"targets": [{"text": "Pericles",
"score": -0.939436,
"label": "negative"}],
"document": {"score": -0.903556,
"label": "negative"}
}
Then, you just need
df = pd.json_normalize(response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']])
Output
>>> df
text score label document.score document.label
0 Pericles -0.939436 negative -0.903556 negative
Optionally, you can rename the columns afterwards as you wish using DataFrame.rename
:
cols_mapping = {
'text': 'Text',
'score': 'Text Score',
'label': 'Text Label',
'document.score': 'Document Score',
'document.label': 'Document Label'
}
df = df.rename(columns=cols_mapping)
>>> df
Text Text Score Text Label Document Score Document Label
0 Pericles -0.939436 negative -0.903556 negative
CodePudding user response:
I believe this should work for you:
targets = {k: [t[k] for t in j['targets']] for k in j['targets'][0].keys()}
doc_scores = [j['document']['score']] * len(j['targets'])
pd.DataFrame({'document_score': doc_scores, **targets})