JSON File [https://drive.google.com/file/d/1Jb3OdoffyA71vYfojxLedZNPDLq9bn7b/view?usp=sharing]
I am trying to read the JSON file in Python, the JSON file is the same as the link above. The code that I wrote looks like this below
lst = []
for line in open(json_path,'r'):
lst.append(json.loads(line))
But for some reason, I kept having this error JSONDecodeError: Expecting value: line 2 column 1 (char 1)
I am wondering did I do something wrong with code or the JSON file has an error in it?
CodePudding user response:
Update
You can strip the break line (\n
) out before using json.loads
function (Thanks to @DeepSpace for the comment):
import json
lst = []
for line in open("sample.json",'r'):
stripped = line.strip("\n")
if stripped != "":
lst.append(json.loads(stripped))
lst
Also you can use ast
module too:
import ast
lst = []
for line in open("sample.json",'r'):
if line.strip("\n") != "":
lst.append(ast.literal_eval(line))
Explanation
ast.literal_eval
changes a dictionary or list in the shape of a string (such as "[1,2,3]"
), to an useable dictionary or list in python (such as [1,2,3]
).
The output of both codes above would be:
[{'content': [{'c_id': '002',
'p_id': 'P02',
'source': 'internet',
'type': 'org'},
{'c_id': '003', 'p_id': 'P03', 'source': 'internet', 'type': 'org'},
{'c_id': '005', 'p_id': 'K01', 'source': 'news', 'type': 'people'}],
'doc_id': '7098727',
'id': 'lni001',
'pub_date': '20220301',
'unique_id': '64WP-UI-POLI'},
{'content': [{'c_id': '002',
'p_id': 'P02',
'source': 'internet',
'type': 'org'},
{'c_id': '003', 'p_id': 'P03', 'source': 'internet', 'type': 'org'},
{'c_id': '005', 'p_id': 'K01', 'source': 'news', 'type': 'people'}],
'doc_id': '7098727',
'id': 'lni001',
'pub_date': '20220301',
'unique_id': '64WP-UI-POLI'},
{'content': [{'c_id': '002',
'p_id': 'P02',
'source': 'internet',
'type': 'org'},
{'c_id': '003', 'p_id': 'P03', 'source': 'internet', 'type': 'org'},
{'c_id': '005', 'p_id': 'K01', 'source': 'news', 'type': 'people'}],
'doc_id': '7098727',
'id': 'lni001',
'pub_date': '20220301',
'unique_id': '64WP-UI-POLI'},
{'content': [{'c_id': '012',
'p_id': 'K21',
'source': 'internet',
'type': 'location'},
{'c_id': '034', 'p_id': 'P17', 'source': 'news', 'type': 'people'},
{'c_id': '098', 'p_id': 'K54', 'source': 'news', 'type': 'people'}],
'doc_id': '7097889',
'id': 'lni002',
'pub_date': '20220301',
'unique_id': '64WP-UI-CFGT'},
{'content': [{'c_id': '012',
'p_id': 'K21',
'source': 'internet',
'type': 'location'},
{'c_id': '034', 'p_id': 'P17', 'source': 'news', 'type': 'people'},
{'c_id': '098', 'p_id': 'K54', 'source': 'news', 'type': 'people'}],
'doc_id': '7097889',
'id': 'lni002',
'pub_date': '20220301',
'unique_id': '64WP-UI-CFGT'}]