Home > Net >  How to recursively access every value of every key in dict, including nested ones?
How to recursively access every value of every key in dict, including nested ones?

Time:05-08

I have a dict that has an undetermined number of elements with undetermined number of child elements. Like this:

data3 {
  'data4': {
  'world.bmp': []}, 'world_elements.png': []
     }
data2 -> {'world_files.pdf': []}
world_building_budget.txt -> []
world_saving_budget.jpg -> []
inspectionProfiles -> {
    'Project_Default.xml': [], 'profiles_settings.xml': []
     }
.gitignore -> []
modules.xml -> []
file-tagging.iml -> []
workspace.xml -> []

I want to recursively iterate through every single key who's value is [] and this includes children and grandchildren keys.

I tried to do it this way, without recursion, but it didn't work:

for i in files_dict:
    for j, k in files_dict[i].items():
        print(j,"->", k)

CodePudding user response:

How about this:

def recursive_dict_access(some_dict: dict):
    print (f'recursive_dict_access called on {some_dict["name"]}')
    for key, value in some_dict.items():
        # if value is an empty list
        if isinstance(value, list) and not value:
            print(f'key: {key} from dict {some_dict["name"}]'}
        # if value is a dictionary
        elif isinstance(value, dict):
            print(f'recursively calling for {value["name"]}'}
            recursive_dict_access(value)

This assumes all dicts have a name key and is very verbose as an example for debugging to show how the code works. An example setup and usage:

a = {'name': 'a', 'key1', [], 'key2', []}
b = {'name': 'b', 'key1', a, 'key2': 'some other value'}
c = {'name': 'c', 'key1', b, 'key2': [], 'key3', []}

recursive_dict_access(c)

Output:

recursive_dict_access called on c
recusively calling for b
recursive_dict_access called on b
recursively calling for a
recursive_dict_access called on a
key: key1 from dict a
key: key2 from dict a
key: key2 from dict c
key: key3 from dict c

as expected, all keys in c (and all child dicts) with value [] have been printed. One thing to watch out for is infinite recursion due to circular references in the dictionary. For example:

c['key2'] = c
recursive_dict_access(c)

Output:

... lots of repeated keys ...
RecursionError: maximum recursion depth exceeded while calling a Python object

If this is a possibility in your data structure, you'll want to somehow mark those dicts that have already been process and ignore them if they get called again. I'll leave that as an exercise for the reader ;)

CodePudding user response:

How to seek all values without recursion: use a stack. It's harder to write the script, but it's probably faster or at least costs less memory, as you don't know how much resources Python allocates when invoking recursive functions.

Now, using recursive functions:

import sys

def write (*args, **kwargs):
    sys.stdout.write(*args, **kwargs)

def seek (data, level=0, cache=[]):
    dtype = type(data)
    tabs = '    '

    if (dtype is dict):
        if (data in cache):
            write ('__CircularReference__')
        else:
            cache.append(data)
            write ('{\n')
            level  = 1
            for k in data:
                write (level * tabs   '"' k '": ')
                seek(data[k], level, cache)
                write (',\n')
            level -= 1
            write (level * tabs   '}')
    elif (dtype is list):
        if (data in cache):
            write ('__CircularReference__')
        else:
            cache.append(data)
            write ('[\n')
            level  = 1
            c = 0
            for i in data:
                write (level * tabs   '['  str(c)  ']: ')
                seek(data[c], level, cache)
                write (',\n')
                c  = 1
            level -= 1
            write (level * tabs   ']')
    else:
        write (str(data))

def main ():
    data = {
        "a": 1,
        "b": "Yep",
        "list": [0, 1, 2, [{"data": "123", "another_list": [4, 5, 6]}, 7, 8, 9, 10] ],
    }

    data['circular'] = data

    seek(data)

if __name__ == '__main__':
    main()

The expected output should be:

{
    "a": 1,
    "b": Yep,
    "list": [
        [0]: 0,
        [1]: 1,
        [2]: 2,
        [3]: [
            [0]: {
                "data": 123,
                "another_list": [
                    [0]: 4,
                    [1]: 5,
                    [2]: 6,
                ],
            },
            [1]: 7,
            [2]: 8,
            [3]: 9,
            [4]: 10,
        ],
    ],
    "circular": __CircularReference__,
}

Now, I added a cache to detect Circular References, stated from @Tony Gweesip which was a few seconds faster than me when posting this answer. It was fun :)

  • Related