JSON file loaded as one row, one column using pandas read_json; expecting a full dataframe-CodePudding

I was provided with a JSON file which looks something like below when opened with Atom:

["[{\"column1\":value1,\"column2\":value2,\"column3\":value3 ...

I tried loading it in Jupyter with pandas read_json as such:

data = pd.read_json('filename.json', orient = 'records')

And when I print data.head(), it shows the result below: screenshot of results

I have also tried the following:

import json
with open('filename.json', 'r') as file:
   data = json.load(file)

When I check with type(data) I see that it is a list. When I check with data[0][1], it returns me { i.e. it seems that the characters in the file has been loaded as a single element in the list?

Just wondering if I am missing anything? I am expecting the JSON file to be loaded as a dataframe so that I can analyze the data inside. Appreciate any guidance and advice. Thanks in advance!

CodePudding user response：

Ok so I think as head() only shows one entry that the outer brackets are not needed. I would try to read your file as a string and change the string to something that pd.read_json() can parse. I assume that your file contains data in a form like this:

["[{\"column1\":2,\"column2\":\"value2\",\"column3\":4}, {\"column1\":4,\"column2\":\"value2\",\"column3\":8}]"]

Now, I would read it and remove trailing \n if they exist and correct the automatic escaping of the read() method. Then I remove [" and "] from the string with this code:

with open('input.json', 'r') as file:
    data = file.read().rstrip()

cleaned_string = data.replace('\\', '')[2:-2]

The result is now a valid json string that looks like this:

'[{"column1":2,"column2":"value2","column3":4}, {"column1":4,"column2":"value2","column3":8}]'

This string can now be easily read by pandas with this line:

pd.read_json(cleaned_string, orient = 'records')

Output:

    column1 column2 column3
0   2   value2  4
1   4   value2  8

The specifics (e.g. the indices to remove unused characters) could be different for your string as I do not know your input. However, I think this approach allows you to read your data.