Home > Back-end >  How to convert a string into dict in hierarchical data scructure the functional way in python?
How to convert a string into dict in hierarchical data scructure the functional way in python?

Time:10-13

Let's assume I've the following python data structure, where runtimeArgs contains a json string, not a python dict.

[
   {
      "runid":"57d45a60-2b34-11ec-8b92-16c898d5a004",
      "properties":{
         "runtimeArgs":"{\"date\":\"2021_10_11\",\"logical.start.time\":\"1634026435347\"}",
         "phase-1":"7bc31901-2b34-11ec-9194-42010a0c0054"
      }
   },
   {
      "runid":"24f7887e-2b28-11ec-b60c-16c898d5a004",
      "properties":{
         "runtimeArgs":"{\"date\":\"2021_10_11\",\"logical.start.time\":\"1634021196053\"}",
         "phase-1":"4712bfa1-2b28-11ec-8968-42010a0c005a"
      }
   }
]

# this is working
my_list[0]["properties"]['runtimeArgs']

# this not working
my_list[0]["properties"]['runtimeArgs']['date']

With a for cycle and json.loads() I could convert this to python dict, but I want to do this in a more functional way, e.g.: map or list comprehension etc.

How is possible to do this?

Expected result:

[
   {
      "runid":"57d45a60-2b34-11ec-8b92-16c898d5a004",
      "properties":{
         "runtimeArgs":{"date":"2021_10_11","logical.start.time":"1634026435347"},
         "phase-1":"7bc31901-2b34-11ec-9194-42010a0c0054"
      }
   },
   {
      "runid":"24f7887e-2b28-11ec-b60c-16c898d5a004",
      "properties":{
         "runtimeArgs":{"date":"2021_10_11","logical.start.time":"1634021196053"},
         "phase-1":"4712bfa1-2b28-11ec-8968-42010a0c005a"
      }
   }
]

# this should work
my_list[0]["properties"]['runtimeArgs']['date']

UPDATE:

one way I could figure it out on my own (which I don't like) is this:

[{**x, "properties": { "runtimeArgs" : json.loads(x["properties"]["runtimeArgs"]) }} for x in my_list if x]

Is there a nicer way to do this?

CodePudding user response:

Backing off a bit, why would you want to do this? Here are a few reasons I can think of, with appropriate solutions:

because I want to implement JSON parser myself, functionally, for the practice

In that case you're largely on your own, but an iterable tokeniser is probably the way to go.

because something as simple as getting element x from this json-formatted string should be easy, and I only ever need to do it once

In this case, use json.loads, but wrap your access in a function:

def get_json(json, key):
    return json.loads(json)[key]

get_json(l[0]["properties"], "date") # this is a function.  I reckon that's functional.  Here with a comprehension:

{l["runid"]: get_json(x["properties"], "date") for x in l}

Because I want to get data out of the structure, deserialising as I go

Use a function:

def get_parse(thing, key):
    try:
        return thing[key]
    except ValueError:
        return json.loads(thing)[key]

get_parse(get_parse(l[0], "properties"), "date")

This function could be made recursive if you wanted, returning the innermost element.

I don't know if these reasons properly cover your use-cases, but they might help. The basic approach (dear to functional programming!) is to put the difficult logic in a function, and then use that in your comprehensions if you like.

JIT Parsing Solution

Purely for the fun of it, because this was nagging at me, here is a JS-like JIT parsing class:

from json import loads


class JITParser:
    def __init__(self, thing):
        if not hasattr(thing, "__getitem__"):
            self._thing = loads(thing)
        else:
            self._thing = thing

    def get(self, key):
        val = self._thing[key]
        if isinstance(val, dict):
            return JITParser(val)
        else:
            try:
                return JITParser(loads(val))
            except ValueError:
                return val

    def __repr__(self):
        return f"JITParser with {repr(self._thing)}"


L = [
    {
        "runid": "57d45a60-2b34-11ec-8b92-16c898d5a004",
        "properties": {
            "runtimeArgs": '{"date":"2021_10_11","logical.start.time":"1634026435347"}',
            "phase-1": "7bc31901-2b34-11ec-9194-42010a0c0054",
        },
    },
    {
        "runid": "24f7887e-2b28-11ec-b60c-16c898d5a004",
        "properties": {
            "runtimeArgs": '{"date":"2021_10_11","logical.start.time":"1634021196053"}',
            "phase-1": "4712bfa1-2b28-11ec-8968-42010a0c005a",
        },
    },
]

j = JITParser(L[0])
j.get("runid")
j.get("properties")
j.get("properties").get("runtimeArgs")
j.get("properties").get("runtimeArgs").get("date")

This class will parse dicts or json representations of dicts, and return JITParser objects wrapping them until it hits something which can't be parsed as JSON, in which case it will return the object itself.

There are lots of possible improvements: you could think of:

  • subclass dict and have [] access
  • implement recursive parsing for lists
  • handle other types, like objects with .dotaccess.
  • etc

but it was fun to mock up, and it might inspire you, so I'll leave it here. Do try it: it's quite fun.

CodePudding user response:

For this particular set of data you could do this:

L = [
   {
      "runid":"57d45a60-2b34-11ec-8b92-16c898d5a004",
      "properties":{
         "runtimeArgs":"{\"date\":\"2021_10_11\",\"logical.start.time\":\"1634026435347\"}",
         "phase-1":"7bc31901-2b34-11ec-9194-42010a0c0054"
      }
   },
   {
      "runid":"24f7887e-2b28-11ec-b60c-16c898d5a004",
      "properties":{
         "runtimeArgs":"{\"date\":\"2021_10_11\",\"logical.start.time\":\"1634021196053\"}",
         "phase-1":"4712bfa1-2b28-11ec-8968-42010a0c005a"
      }
   }
]

for d in L:
    print(eval(d['properties']['runtimeArgs'])['date'])
  • Related