I am trying to read a JSON file using pandas. The JSON file is in this format:
{
"category": "CRIME",
"headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV",
"authors": "Melissa Jeltsen",
"link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89", "short_description": "She left her husband. He killed their children. Just another day in America.",
"date": "2018-05-26"
}
{
"category": "ENTERTAINMENT",
"headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song",
"authors": "Andy McDonald",
"link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201",
"short_description": "Of course, it has a song.",
"date": "2018-05-26"
}
However, I get the following error that I don't understand why:
ValueError Traceback (most recent call last)
/var/folders/j6/rj901v4j40368zfdw64pbf700000gn/T/ipykernel_11792/4234726591.py in <module>
----> 1 df = pd.read_json('db.json', lines=True)
2 df.head()
~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
205 else:
206 kwargs[new_arg_name] = new_arg_value
--> 207 return func(*args, **kwargs)
208
209 return cast(F, wrapper)
~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
610
611 with json_reader:
--> 612 return json_reader.read()
613
614
~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in read(self)
742 data = ensure_str(self.data)
743 data_lines = data.split("\n")
--> 744 obj = self._get_object_parser(self._combine_lines(data_lines))
745 else:
746 obj = self._get_object_parser(self.data)
~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
766 obj = None
767 if typ == "frame":
--> 768 obj = FrameParser(json, **kwargs).parse()
769
770 if typ == "series" or obj is None:
~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in parse(self)
878 self._parse_numpy()
879 else:
--> 880 self._parse_no_numpy()
881
882 if self.obj is None:
~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1131 if orient == "columns":
1132 self.obj = DataFrame(
-> 1133 loads(json, precise_float=self.precise_float), dtype=None
1134 )
1135 elif orient == "split":
ValueError: Expected object or value
My code is written as follows:
import pandas as pd
df = read_json('db.json', lines=True)
df.head()
I tried changing the structure of the JSON file as suggested by here but it doesn't work. The error that I get is the same error as the one I have specified above. Is there any other way that i can solve this issue?
CodePudding user response:
You can wrap it in square brackets [] and add a comma between the dictionaries for valid json.
[{
"category": "CRIME",
"headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV",
"authors": "Melissa Jeltsen",
"link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89", "short_description": "She left her husband. He killed their children. Just another day in America.",
"date": "2018-05-26"
},
{
"category": "ENTERTAINMENT",
"headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song",
"authors": "Andy McDonald",
"link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201",
"short_description": "Of course, it has a song.",
"date": "2018-05-26"
}]
Read file::
import pandas as pd
df = pd.read_json("/path/to/file/db.json")
print(df)
Output:
category headline authors link short_description date
0 CRIME There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV Melissa Jeltsen https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89 She left her husband. He killed their children. Just another day in America. 2018-05-26
1 ENTERTAINMENT Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song Andy McDonald https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-... Of course, it has a song. 2018-05-26