Home > Software engineering >  "trailing data" error when reading json to Pandas dataframe
"trailing data" error when reading json to Pandas dataframe

Time:11-20

I have a Python 3.8.5 script that gets a JSON from an API, saves to disk, reads JSON to DF. It works.

df = pd.io.json.read_json('json_file', orient='records')

I want to try IO buffer instead so I don't have to read/write to disk, but I am getting an error. The code is like this:

from io import StringIO
io = StringIO()
json_out = []
# some code to append API results to json_out
json.dump(json_out, io)
df = pd.io.json.read_json(io.getvalue())

On that last line I get the error

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 199, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 618, in read_json
    result = json_reader.read()

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 755, in read
    obj = self._get_object_parser(self.data)

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 777, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 886, in parse
    self._parse_no_numpy()

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 1119, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None

ValueError: Trailing data

The JSON is in a list format. So this is not the actual json but it looks like this when I write to disk:

json = [
      {"state": "North Dakota",
        "address": "123 30th st E #206",
        "account": "123"
    },
    {"state": "North Dakota",
        "address": "456 30th st E #206",
        "account": "456"
    }
    ]

Given that it worked in the first case (write/read from disk), I don't know how to troubleshoot. How do I troubleshoot something in the buffer? The actual data is mostly text but has some number fields.

CodePudding user response:

Don't know what's going wrong for you, this works for me:

import json
import pandas as pd
from io import StringIO

json_out = [
    {"state": "North Dakota",
     "address": "123 30th st E #206",
     "account": "123"
     },
    {"state": "North Dakota",
     "address": "456 30th st E #206",
     "account": "456"
     }
]

io = StringIO()
json.dump(json_out, io)
df = pd.io.json.read_json(io.getvalue())
print(df)

leads me to believe there's something wrong with the code that appends the API data...

However, if you have a list of dictionaries, you don't need the IO step. You can just do:

pd.DataFrame(json_out)

EDIT: I think I remember this error when there was a comma at the end of my json like so:

[
  {
    "hello":"world",
  },
]
  • Related