I was provided with a JSON file which looks something like below when opened with Atom:
["[{\"column1\":value1,\"column2\":value2,\"column3\":value3 ...
I tried loading it in Jupyter with pandas read_json as such:
data = pd.read_json('filename.json', orient = 'records')
And when I print data.head()
, it shows the result below:
screenshot of results
I have also tried the following:
import json
with open('filename.json', 'r') as file:
data = json.load(file)
When I check with type(data)
I see that it is a list
. When I check with data[0][1]
, it returns me {
i.e. it seems that the characters in the file has been loaded as a single element in the list?
Just wondering if I am missing anything? I am expecting the JSON file to be loaded as a dataframe so that I can analyze the data inside. Appreciate any guidance and advice. Thanks in advance!
CodePudding user response:
Ok so I think as head()
only shows one entry that the outer brackets are not needed. I would try to read your file as a string and change the string to something that pd.read_json()
can parse. I assume that your file contains data in a form like this:
["[{\"column1\":2,\"column2\":\"value2\",\"column3\":4}, {\"column1\":4,\"column2\":\"value2\",\"column3\":8}]"]
Now, I would read it and remove trailing \n
if they exist and correct the automatic escaping of the read()
method. Then I remove ["
and "]
from the string with this code:
with open('input.json', 'r') as file:
data = file.read().rstrip()
cleaned_string = data.replace('\\', '')[2:-2]
The result is now a valid json string that looks like this:
'[{"column1":2,"column2":"value2","column3":4}, {"column1":4,"column2":"value2","column3":8}]'
This string can now be easily read by pandas with this line:
pd.read_json(cleaned_string, orient = 'records')
Output:
column1 column2 column3
0 2 value2 4
1 4 value2 8
The specifics (e.g. the indices to remove unused characters) could be different for your string as I do not know your input. However, I think this approach allows you to read your data.