Home > Software design >  How to merge non-fixed key json multilines into one json abstractly
How to merge non-fixed key json multilines into one json abstractly

Time:09-27

If I have a heavy json file that have 30m entries like that

{"id":3,"price":"231","type":"Y","location":"NY"}
{"id":4,"price":"321","type":"N","city":"BR"}
{"id":5,"price":"354","type":"Y","city":"XE","location":"CP"}
--snip--
{"id":30373779,"price":"121","type":"N","city":"SR","location":"IU"}
{"id":30373780,"price":"432","type":"Y","location":"TB"}
{"id":30373780,"price":"562","type":"N","city":"CQ"}

how I can only abstract the location and the city and parse it into one json like that in python:

{
    "orders":{
        3:{
            "location":"NY"
        },
        4:{
            "city":"BR"
        },
        5:{
            "city":"XE",
            "location":"CP"
        },
        30373779:{
            "city":"SR",
            "location":"IU"
        },
        30373780:{
            "location":"TB"
        },
        30373780:{
            "city":"CQ"
        }
    }
}

P.S: beatufy the syntax is not necessary.

CodePudding user response:

Assuming your input file is actually in jsonlines format, then you can read each line, extract the city and location keys from the dict and then append those to a new dict:

import json
from collections import defaultdict

orders = { 'orders' : defaultdict(dict) }
with open('orders.txt', 'r') as f:
    for line in f:
        o = json.loads(line)
        id = o['id']
        if 'location' in o:
            orders['orders'][id]['location'] = o['location'] 
        if 'city' in o:
            orders['orders'][id]['city'] = o['city'] 

print(orders)

Output for your sample data (note it has two 30373780 id values, so the values get merged into one dict):

{
    "orders": {
        "3": {
            "location": "NY"
        },
        "4": {
            "city": "BR"
        },
        "5": {
            "location": "CP",
            "city": "XE"
        },
        "30373779": {
            "location": "IU",
            "city": "SR"
        },
        "30373780": {
            "location": "TB",
            "city": "CQ"
        }
    }
}

CodePudding user response:

As you've said that your file is pretty big and you probably don't want to keep all entries in memory here is the way to consume source file line by line and write output immediately:

import json

with open(r"in.jsonp") as i_f, open(r"out.json", "w") as o_f:
    o_f.write('{"orders":{')
    for i in i_f:
        i_obj = json.loads(i)
        o_f.write(f'{i_obj["id"]}:')
        o_obj = {}
        if location := i_obj.get("location"):
            o_obj["location"] = location
        if city := i_obj.get("city"):
            o_obj["city"] = city
        json.dump(o_obj, o_f)
        o_f.write(",")
    o_f.write('}}')

It will generate semi-valid JSON object in same format you've provided in your question.


You can help my country, check my profile info.

  • Related