Home > Enterprise >  Python - iterate through each nested JSON entry to store two specific values at the same tree level
Python - iterate through each nested JSON entry to store two specific values at the same tree level

Time:09-30

So I have a JSON file which looks like this:

data = {
"path": "/",
"subPages": [
    {
        "path": "/1",
        "subPages": [
            {
                "path": "/12",
                "subPages": [
                    {
                        "path": "/123",
                        "subPages": [],
                        "url": "123_URL",
                    }
                ],
                "url": "12_URL",
            },
            {
                "path": "/13",
                "subPages": [
                    {
                        "path": "/131",
                        "subPages": [
                            {
                                "path": "/1311",
                                "subPages": [
                                    {
                                        "path": "/13111",
                                        "subPages": [],
                                        "url": "13111_URL",
                                    }
                                ],
                                "url": "1311_URL",
                            }
                        ],
                        "url": "131_URL",
                    }
                ],
                "url": "13_URL",
            }
        ],
        "url": "1_URL",
    }
]

}

I want to be able to parse this JSON into a dictionary of key "path" and value "url". Something like getting:

dict = {"/" : "1_URL", "/12" : "12_URL", "/123" : "123_URL", "/13" : "13_URL }

And so on. This has been a bit difficult to accomplish because I need to access each level of the hierarchy independently to extract intended values and it's a file that may even have another 2 levels in the JSON hierarchy to parse.

The challenge here is because the "subpages" array is defined always before url key. My recursive approach failed because of this:

def json_extract(obj, key):
arr = []

  def extract(obj, arr, key):
      if isinstance(obj, dict):
          for k, v in obj.items():
              if isinstance(v, (dict, list)):
                  extract(v, arr, key)
              elif k == key:
                  arr.append(v)
      elif isinstance(obj, list):
          for item in obj:
              extract(item, arr, key)
      return arr

values = extract(obj, arr, key)
return values

Do you have any idea how I can achieve this? Even just some logic you can point me to Thanks in advance!

CodePudding user response:

Try:

def get_kv(o):
    if isinstance(o, dict):
        if "path" in o and "url" in o:
            yield o["path"], o["url"]
        for v in o.values():
            yield from get_kv(v)
    elif isinstance(o, list):
        for v in o:
            yield from get_kv(v)


print(dict(get_kv(data)))

Prints:

{
    "/1": "1_URL",
    "/12": "12_URL",
    "/123": "123_URL",
    "/13": "13_URL",
    "/131": "131_URL",
    "/1311": "1311_URL",
    "/13111": "13111_URL",
}

CodePudding user response:

OK ... there is an excellent answer there by Andrej Kesely, so let's apply this answer to your json_extract function:

def json_extract(json_dct, key1, key2):
    dct   = {}
    def extract(json_dct, key1, key2):
        if isinstance(json_dct, dict):
            if key1 in json_dct and key2 in json_dct:
                dct[json_dct[key1]] = json_dct[key2]
            for v in json_dct.values():
                extract(v, key1, key2)
        elif isinstance(json_dct, list):
            for v in json_dct:
                extract(v, key1, key2)
        return dct
    result = extract(json_dct, key1, key2)
    return result

print(json_extract(data, "path", "url",))

And if you are curios how your way of approaching it could be turned into what you intended it to be without using the revelation that you have here to do with a dictionary from which all keys are available in parallel, check out:

def json_extract(obj, key, key2):
    stack = []
    dct   = {}
    def extract(obj, dct, stack, key, key2):
        if isinstance(obj, dict):
            for k, v in obj.items():
                if k == key2:
                    stack.append(v)
                if isinstance(v, (dict, list)):
                    extract(v, dct, stack, key, key2)
                elif k == key:
                    dct[stack.pop()] = v
        elif isinstance(obj, list):
            for item in obj:
                extract(item, dct, stack, key, key2)
        return dct
    result = extract(obj, dct, stack, key, key2)
    return result
  • Related