How to make a table from JSON? ValueError: Mixing dicts with non-Series may lead to ambiguous orderi-CodePudding

I am really a beginner with python, but I am trying to use IBM's sentiment analyzer to make a dataset. I get a JSON response which I want to put into a table. So far what I have is:

response = natural_language_understanding.analyze(
    text = df_text,
    features=Features(sentiment=SentimentOptions(targets=['Pericles']))).get_result()
print(json.dumps(response, indent=2))

respj = json.dumps(response['sentiment'])
respj

which prints

'{"targets": [{"text": "Pericles", "score": -0.939436, "label": "negative"}], "document": {"score": -0.903556, "label": "negative"}}'

Now it is at this point that I would really like to make a pandas table with this data. Ideally, I would like all the above information formated like -> Text | text score | Document score

I don't really need the label positive or negative but it doesn't hurt to have it. How would I accomplish this? Right now when I try

json_df = pd.read_json(respj)
json_df.head()

I get

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-b06d8a1caf3f> in <module>
----> 1 json_df = pd.read_json(respj)
      2 json_df.head()

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    212                 else:
    213                     kwargs[new_arg_name] = new_arg_value
--> 214             return func(*args, **kwargs)
    215 
    216         return cast(F, wrapper)

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    606         return json_reader
    607 
--> 608     result = json_reader.read()
    609     if should_close:
    610         filepath_or_buffer.close()

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read(self)
    729             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    730         else:
--> 731             obj = self._get_object_parser(self.data)
    732         self.close()
    733         return obj

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    751         obj = None
    752         if typ == "frame":
--> 753             obj = FrameParser(json, **kwargs).parse()
    754 
    755         if typ == "series" or obj is None:

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in parse(self)
    855 
    856         else:
--> 857             self._parse_no_numpy()
    858 
    859         if self.obj is None:

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1086 
   1087         if orient == "columns":
-> 1088             self.obj = DataFrame(
   1089                 loads(json, precise_float=self.precise_float), dtype=None
   1090             )

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    433             )
    434         elif isinstance(data, dict):
--> 435             mgr = init_dict(data, index, columns, dtype=dtype)
    436         elif isinstance(data, ma.MaskedArray):
    437             import numpy.ma.mrecords as mrecords

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    252             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    253         ]
--> 254     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    255 
    256 

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
     62     # figure out the index, if necessary
     63     if index is None:
---> 64         index = extract_index(arrays)
     65     else:
     66         index = ensure_index(index)

/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in extract_index(data)
    366 
    367             if have_dicts:
--> 368                 raise ValueError(
    369                     "Mixing dicts with non-Series may lead to ambiguous ordering."
    370                 )

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering

If anyone can give me some tips as to how to make the table I am trying to make I would really appreciate it. Also if anyone can explain the error I have right now that would be really great too. I think I get the basic premise that it's because the JSON has two incompatible "tables" in it already. Thank you for any help.

CodePudding user response：

You don't need to dump the response['sentiment'] as a JSON string if you just want to turn it into a DataFrame. Use pandas.json_normalize instead.

It seems that response['sentiment'] looks something like

>>> response['sentiment']

{
    "targets": [{"text": "Pericles", 
                 "score": -0.939436, 
                 "label": "negative"}], 
    "document": {"score": -0.903556, 
                 "label": "negative"}
}

Then, you just need

df = pd.json_normalize(response['sentiment'], 
                       record_path='targets',
                       meta=[['document','score'], ['document','label']])

Output

>>> df

       text     score     label document.score document.label
0  Pericles -0.939436  negative      -0.903556       negative

Optionally, you can rename the columns afterwards as you wish using DataFrame.rename:

cols_mapping = {
    'text': 'Text', 
    'score': 'Text Score', 
    'label': 'Text Label', 
    'document.score': 'Document Score', 
    'document.label': 'Document Label'
}

df = df.rename(columns=cols_mapping)

>>> df 

       Text  Text Score Text Label Document Score Document Label
0  Pericles   -0.939436   negative      -0.903556       negative

CodePudding user response：

I believe this should work for you:

targets = {k: [t[k] for t in j['targets']] for k in j['targets'][0].keys()}
doc_scores = [j['document']['score']] * len(j['targets'])
pd.DataFrame({'document_score': doc_scores, **targets})