I wanted to read the Electronics json.gzip file from the list of available Amazon datasets: http://jmcauley.ucsd.edu/data/amazon/qa/
JSON sample:
{'questionType': 'yes/no', 'asin': 'B00004U9JP', 'answerTime': 'Jun 27, 2014', 'unixTime': 1403852400, 'question': 'I have a 9 year old Badger 1 that needs replacing, will this Badger 1 install just like the original one?', 'answerType': '?', 'answer': 'I replaced my old one with this without a hitch.'}
{'questionType': 'open-ended', 'asin': 'B00004U9JP', 'answerTime': 'Apr 28, 2014', 'unixTime': 1398668400, 'question': 'model number', 'answer': 'This may help InSinkErator Model BADGER-1: Badger 1 1/3 HP Garbage Disposal PRODUCT DETAILS - Bellacor Number:309641 / UPC:050375000419 Brand SKU:500181'}
{'questionType': 'yes/no', 'asin': 'B00004U9JP', 'answerTime': 'Aug 25, 2014', 'unixTime': 1408950000, 'question': 'can I replace Badger 1 1/3 with a Badger 5 1/2 - with same connections?', 'answerType': '?', 'answer': 'Plumbing connections will vary with different models. Usually the larger higher amp draw wil not affect the wiring, the disposals are designed to a basic standard setup common to all brands. They want you to buy their brand or version or model. As long as the disposal is UL listed, United Laboratories, they will setup and bolt up the same.'}
{'questionType': 'yes/no', 'asin': 'B00004U9JP', 'answerTime': 'Nov 3, 2014', 'unixTime': 1415001600, 'question': 'Does this come with power cord and dishwasher hook up?', 'answerType': '?', 'answer': 'It does not come with a power cord. It does come with the dishwasher hookup.'}
{'questionType': 'open-ended', 'asin': 'B00004U9JP', 'answerTime': 'Jun 21, 2014', 'unixTime': 1403334000, 'question': 'loud noise inside when turned on. sounds like blades are loose', 'answer': 'Check if you dropped something inside.Usually my wife put lemons inside make a lot of noise and I will have to get them out using my hands or mechanical fingers .'}
{'questionType': 'open-ended', 'asin': 'B00004U9JP', 'answerTime': 'Jul 13, 2013', 'unixTime': 1373698800, 'question': 'where is the reset button located', 'answer': 'on the bottom'}
My current code uses the pd.read_json method with specified lines and orient parameters, however changing these doesn't seem to work.
electronics_url = 'http://jmcauley.ucsd.edu/data/amazon/qa/qa_Electronics.json.gz'
electronics_df = pd.read_json(electronics_url, orient='split', lines=True, compression='gzip')
I get the ValueError: Expected object or value
. I tried all possible variations of the orient parameter, but it does not help. I also tried to open the file from a local buffer, unfortunately with no success.
What is the problem?
CodePudding user response:
The content of the archive is not JSON valid. Each row of the file looks like a Python dict. You can use this snippet:
import gzip
import ast
import urllib
data = []
url = 'http://jmcauley.ucsd.edu/data/amazon/qa/icdm/QA_Baby.json.gz'
with urllib.request.urlopen(url) as r:
for qa in gzip.open(r):
data.append(ast.literal_eval(qa.decode('utf-8')))
After that, use pd.json_normalize
to read the list of dict:
answers = pd.json_normalize(data, ['questions', 'answers'])
print(answers)
# Output
answerText answererID answerTime helpful answerType answerScore
0 Yes, the locks will keep adults out too. My h... A2WQX54BDMJTKY November 6, 2013 [1, 1] NaN NaN
1 Yes if you install it correctly. a lot of fol... A3VRA4069D8C7L November 6, 2013 [0, 0] NaN NaN
2 It probably will... it's pretty good and much... A3JEFPEUXUS0I November 6, 2013 [0, 0] NaN NaN
3 The size of the locking mechanism. I bought th... A1OCJ9L2PQJBUD January 12, 2015 [0, 0] NaN NaN
4 The locking mechanism unlocks with the magnet . A2KGWT9ZN4M1PO January 14, 2015 [0, 0] NaN NaN
... ... ... ... ... ... ...
82029 I feel it would work fine for the 4 year old. ... A2BIFRN88PPMGT September 17, 2014 [1, 1] Y 0.9828
82030 In my opinion, the pillow was slightly bigger ... AHM5QX41VSV6B September 17, 2014 [0, 0] ? 0.9411
82031 Our 2yo is a belly sleeper too. At first she w... AKW750RUMWK17 August 28, 2014 [1, 1] NaN NaN
82032 Hi. Yes, the pillow will settle with use for s... A1XQAY39M2KOL0 August 27, 2014 [0, 0] NaN NaN
82033 I would recommend contacting the company to se... A1ZCGIRS68DM9J August 28, 2014 [0, 0] NaN NaN
[82034 rows x 6 columns]