Home > Blockchain >  I am trying to open JSON file and extracts data from specific fields in Python (JupyterLab)
I am trying to open JSON file and extracts data from specific fields in Python (JupyterLab)

Time:01-22

I have thousands of amazong product review data as JSON file. I need to process data in python and extract data from fields: "reviewText”, “overall”, and “summary”

The Json file looks like this:

{"reviewerID": "A11N155CW1UV02", "asin": "B000H00VBQ", "reviewerName": "AdrianaM", "helpful": [0, 0], "reviewText": "I had big expectations because I love English TV, in particular Investigative and detective stuff but this guy is really boring. It didn't appeal to me at all.", "overall": 2.0, "summary": "A little bit boring for me", "unixReviewTime": 1399075200, "reviewTime": "05 3, 2014"}
{"reviewerID": "A3BC8O2KCL29V2", "asin": "B000H00VBQ", "reviewerName": "Carol T", "helpful": [0, 0], "reviewText": "I highly recommend this series. It is a must for anyone who is yearning to watch \"grown up\" television. Complex characters and plots to keep one totally involved. Thank you Amazin Prime.", "overall": 5.0, "summary": "Excellent Grown Up TV", "unixReviewTime": 1346630400, "reviewTime": "09 3, 2012"}
{"reviewerID": "A60D5HQFOTSOM", "asin": "B000H00VBQ", "reviewerName": "Daniel Cooper \"dancoopermedia\"", "helpful": [0, 1], "reviewText": "This one is a real snoozer. Don't believe anything you read or hear, it's awful. I had no idea what the title means. Neither will you.", "overall": 1.0, "summary": "Way too boring for me", "unixReviewTime": 1381881600, "reviewTime": "10 16, 2013"}

I am trying this:

import json

with open('Amazon_Instant_Video_5.json') as json_file:
    data = json.load(json_file)
print(data['reviewText']['overal']['summary'])

But it gives me this error:

JSONDecodeError                           Traceback (most recent call last)
/var/folders/76/9lhw7d657y757vg308n_thww0000gn/T/ipykernel_4272/378691339.py in <module>
      2 
      3 with open('Amazon_Instant_Video_5.json') as json_file:
----> 4     data = json.load(json_file)
      5 print(data['reviewText']['overal']['summary'])

~/opt/anaconda3/lib/python3.9/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,

~/opt/anaconda3/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    344             parse_int is None and parse_float is None and
    345             parse_constant is None and object_pairs_hook is None and not kw):
--> 346         return _default_decoder.decode(s)
    347     if cls is None:
    348         cls = JSONDecoder

~/opt/anaconda3/lib/python3.9/json/decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 394)

CodePudding user response:

That's JSON Lines format. Each line is a JSON string. Read it a line at a time and pass it to json.loads():

import json

with open('Amazon_Instant_Video_5.json') as json_file:
    for line in json_file:
        data = json.loads(line)
        print(data['reviewText'], data['overall'], data['summary'])

The "extra data" is due to json.load() expecting the entire file to be a single JSON object and after scanning the first line thinks the JSON object is complete.

CodePudding user response:

Why are you using the normal approach op opening a file when you can use a function from the JSON module which is json.load(file_onject) this will return an object of JSON file which you can use to get the data

Code Example

# Python program to read
# json file


import json

# Opening JSON file
f = open('data.json')

# returns JSON object as 
# a dictionary
data = json.load(f)

# Iterating through the JSON
# list
for i in data:
    print(i, ':', data[i])

# Closing file
f.close()
  • Related