Home > Blockchain >  How to wrap JSON file into a list
How to wrap JSON file into a list

Time:10-02

I am struggling to convert this data into a list to be used in Python.

The file contains large sets of data and it is in JSON format.

Here is a sample of the data:

{"_id":{"$oid":"60551"},"barcode":"511111019862","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac114be37ce2ead437550"},"$ref":"Cogs"},"name":"test brand @1612366101024","topBrand":false}
{"_id":{"$oid":"601c5460be37ce2ead43755f"},"barcode":"511111519928","brandCode":"STARBUCKS","category":"Beverages","categoryCode":"BEVERAGES","cpg":{"$id":{"$oid":"5332f5fbe4b03c9a25efd0ba"},"$ref":"Cogs"},"name":"Starbucks","topBrand":false}
{"_id":{"$oid":"601ac142be37ce2ead43755d"},"barcode":"511111819905","brandCode":"TEST BRANDCODE @1612366146176","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac142be37ce2ead437559"},"$ref":"Cogs"},"name":"test brand @1612366146176","topBrand":false}
{"_id":{"$oid":"601ac142be37ce2ead43755a"},"barcode":"511111519874","brandCode":"TEST BRANDCODE @1612366146051","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac142be37ce2ead437559"},"$ref":"Cogs"},"name":"test brand @1612366146051","topBrand":false}

Here is the code I ran:

 import json

with open("brands.json") as f:
    data = json.load(f)

print(data)

And here is the error I get:

raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 229)

CodePudding user response:

something like the below (read the file line by line, convert each line to dict and append to a list)

import json
data = []
with open('brands.json') as f:
    for line in f:
        data.append(json.loads(line.strip()))
print(data)

output

[{'_id': {'$oid': '60551'}, 'barcode': '511111019862', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366101024', 'topBrand': False}, {'_id': {'$oid': '601c5460be37ce2ead43755f'}, 'barcode': '511111519928', 'brandCode': 'STARBUCKS', 'category': 'Beverages', 'categoryCode': 'BEVERAGES', 'cpg': {'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}, 'name': 'Starbucks', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755d'}, 'barcode': '511111819905', 'brandCode': 'TEST BRANDCODE @1612366146176', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146176', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755a'}, 'barcode': '511111519874', 'brandCode': 'TEST BRANDCODE @1612366146051', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146051', 'topBrand': False}]

Same result with less code below

import json
with open('brands.json') as f:
    data = [json.loads(line.strip()) for line in f]
print(data)

CodePudding user response:

Right, if I understood well what you are trying to do is to convert the data in the "brands.json" file into a list.

First of all when you open a file you need to read it, like this to read the lines:

with open("brands.json", 'r') as f:
    read_lines = f.readlines()

Now, to do what you want to do you can simply follow:

import json

data = []
with open("brands.json", 'r') as f:
    read_lines = f.readlines()
    for lines_of_data in read_lines:
        line_json = json.loads(lines_of_data.strip())
        data.append(line_json)

this if you want a dict with the data in it, that will look like:

[{'_id': {'$oid': '60551'}, 'barcode': '511111019862', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366101024', 'topBrand': False}, {'_id': {'$oid': '601c5460be37ce2ead43755f'}, 'barcode': '511111519928', 'brandCode': 'STARBUCKS', 'category': 'Beverages', 'categoryCode': 'BEVERAGES', 'cpg': {'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}, 'name': 'Starbucks', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755d'}, 'barcode': '511111819905', 'brandCode': 'TEST BRANDCODE @1612366146176', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146176', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755a'}, 'barcode': '511111519874', 'brandCode': 'TEST BRANDCODE @1612366146051', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146051', 'topBrand': False}]

or you can load the data as a json, into a json (much easier to work with)

import json

data = {}
with open("brands.json", 'r') as f:
    read_lines = f.readlines()
    for lines_of_data in read_lines:
        line_json = json.loads(lines_of_data.strip())
        line_id = line_json['_id']['$oid']
        data[line_id] = line_json

in this way you will have a json with the "$oid" used as the key per each line of data, it'll look like:

{'60551': {'_id': {'$oid': '60551'}, 'barcode': '511111019862', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366101024', 'topBrand': False}, '601c5460be37ce2ead43755f': {'_id': {'$oid': '601c5460be37ce2ead43755f'}, 'barcode': '511111519928', 'brandCode': 'STARBUCKS', 'category': 'Beverages', 'categoryCode': 'BEVERAGES', 'cpg': {'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}, 'name': 'Starbucks', 'topBrand': False}, '601ac142be37ce2ead43755d': {'_id': {'$oid': '601ac142be37ce2ead43755d'}, 'barcode': '511111819905', 'brandCode': 'TEST BRANDCODE @1612366146176', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146176', 'topBrand': False}, '601ac142be37ce2ead43755a': {'_id': {'$oid': '601ac142be37ce2ead43755a'}, 'barcode': '511111519874', 'brandCode': 'TEST BRANDCODE @1612366146051', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146051', 'topBrand': False}}

and I find json much easier to work with.

  • Related