python dict recursion returns empty list-CodePudding

I have this dictionary that I am trying to iterate through recursively. When I hit a matching node match I want to return that node which is a list. Currently with my code I keep on getting an empty list. I have stepped through the code and I see my check condition being hit, but the recursion still returns an empty value. what am I doing wrong here? thanks

dictionary data:

{
    "apiVersion": "v1",
    "kind": "Deployment",
    "metadata": {
        "name": "cluster",
        "namespace": "namespace",
    },
    "spec": {
        "template": {
            "metadata": {
                "labels": {
                    "app": "flink",
                    "cluster": "repo_name-cluster",
                    "component": "jobmanager",
                    "track": "prod",
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "jobmanager",
                        "image": "IMAGE_TAG_",
                        "imagePullPolicy": "Always",
                        "args": ["jobmanager"],
                        "resources": {
                            "requests": {"cpu": "100.0", "memory": "100Gi"},
                            "limits": {"cpu": "100.0", "memory": "100Gi"},
                        },
                        "env": [
                            {
                                "name": "ADDRESS",
                                "value": "jobmanager-prod",
                            },
                            {"name": "HADOOP_USER_NAME", "value": "yarn"},
                            {"name": "JOB_MANAGER_MEMORY", "value": "1000m"},
                            {"name": "HADOOP_CONF_DIR", "value": "/etc/hadoop/conf"},
                            {
                                "name": "TRACK",
                                "valueFrom": {
                                    "fieldRef": {
                                        "fieldPath": "metadata.labels['track']"
                                    }
                                },
                            },
                        ],
                    }
                ]
            },
        },
    },
}

code:

test = iterdict(data, "env")
print(test)

def iterdict(data, match):
    output = []
    if not isinstance(data, str):
        for k, v in data.items():
            print("key ", k)
            if isinstance(v, dict):
                iterdict(v, match)
            elif isinstance(v, list):
                if k.lower() == match.lower():
                    # print(v)
                    output  = v
                    return output
                else:
                    for i in v:
                        iterdict(i, match)
    return output

expected return value:

[{'name': 'JOB_MANAGER_RPC_ADDRESS', 'value': 'repo_name-cluster-jobmanager-prod'}, {'name': 'HADOOP_USER_NAME', 'value': 'yarn'}, {'name': 'JOB_MANAGER_MEMORY', 'value': '1000m'}, {'name': 'HADOOP_CONF_DIR', 'value': '/etc/hadoop/conf'}, {'name': 'TRACK', 'valueFrom': {...}}]

CodePudding user response：

When you recurse to iterdict, you're simply throwing away the return value. Thus, since every value in the top level of your dictionary is either a string or a dict, you will end up just returning an empty list.

You probably want to append the recursive outputs:

output  = iterdict(v, match)

and

output  = iterdict(i, match)

However, this is potentially inefficient as you will build a lot of intermediate lists. A better strategy might be to make your function a generator; the name iterdict would suggest this anyway. To do so, get rid of your output variable and the return statements, and use yield instead:

yield from iterdict(v, match)
yield from v
yield from iterdict(i, match)

and then, at the top level, you can just iterate over your results:

for value in iterdict(data, "env"):
    ...

or, if you really need a list, collect the generator output into a list:

test = list(iterdata(data, "env"))

This will likely be faster (no intermediate lists) and more Pythonic.

CodePudding user response：

You are not updating the output to output list when you are running it recursively. You can either append the output or use yield keyword to make use of generators in python. Return creates temporary lists which are memry intensive and impedes performance when you are running it recursively. Thats why use generators.

def iterdict(data, match):
    if isinstance(data, str):
        return []

    for k, v in data.items():
        if isinstance(v, dict):
            yield from iterdict(v, match)
        elif isinstance(v, list):
            if k.lower() == match.lower():
                yield from v

            for i in v:
                yield from iterdict(i, match)


test = list(iterdict(data, "env"))
print(test)

CodePudding user response：

The issue with your code is that you are not updating the output list with the recursive calls. When you call iterdict recursively, it returns an updated list, but you are not assigning it to output. Instead, you should update output with the returned list like this:

def iterdict(data, match):
    output = []
    if not isinstance(data, str):
        for k, v in data.items():
            print("key ", k)
            if isinstance(v, dict):
                output  = iterdict(v, match)
            elif isinstance(v, list):
                if k.lower() == match.lower():
                    # print(v)
                    output  = v
                    return output
                else:
                    for i in v:
                        output  = iterdict(i, match)
    return output

test = iterdict(data, "env")
print(test)