Home > Mobile >  Convert list of path-like strings to nested dictionary of lists (arbitrary depth)
Convert list of path-like strings to nested dictionary of lists (arbitrary depth)

Time:10-14

The following is sample input data (real data may have more levels than shown here, hence why the solution should work for an arbitrary depth):

['/SKA_20-VA-001/SKA_20-V-0546',
 '/SKA_20-VA-001/SKA_20-V-0148',
 '/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028A',
 '/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028B',
 '/SKA_20-VA-001/SKA_20-PT-0034/SKA_20-PI-0034',
 '/SKA_20-VA-001/SKA_20-V-0685',
 '/SKA_20-VA-001/SKA_20-V-0551']

I would like to turn this into a nested dictionary to store the hierarchical structure of the paths, where each parent directory becomes a dict key and all files become elements in a list.

The desired outcome

{
  "SKA_20-VA-001": [
    "SKA_20-V-0546",
    "SKA_20-V-0148",
    "SKA_20-V-0685",
    "SKA_20-V-0551",
    {"SKA_20-LT-0028A": ["SKA_20-LI-0028A", "SKA_20-LI-0028B"]},
    {"SKA_20-PT-0034": ["SKA_20-PI-0034"]}
  ]
}

I have come across this answer, but it does not really solve my question. Or I am not able to modify it correctly.

CodePudding user response:

Such a structure is much easier to build, if you treat files like directories. After building it, you can still convert that into the list/dict combination that you were looking for.

In this case, elements without any children ({}) are expected to be files:

def add_path(tree, split_path):
    subtree = tree.setdefault(split_path[0], {})
    if len(split_path) > 1:
        add_path(subtree, split_path[1:])


def parse_tree(paths):
    tree = {}
    for path in paths:
        add_path(tree, path.split("/"))
    return tree


if __name__ == "__main__":
    print(parse_tree(
        ['/SKA_20-VA-001/SKA_20-V-0546',
         '/SKA_20-VA-001/SKA_20-V-0148',
         '/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028A',
         '/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028B',
         '/SKA_20-VA-001/SKA_20-PT-0034/SKA_20-PI-0034',
         '/SKA_20-VA-001/SKA_20-V-0685',
         '/SKA_20-VA-001/SKA_20-V-0551']
    ))

results in

{
  "": {
    "SKA_20-VA-001": {
      "SKA_20-V-0546": {},
      "SKA_20-V-0148": {},
      "SKA_20-LT-0028A": {
        "SKA_20-LI-0028A": {},
        "SKA_20-LI-0028B": {}
      },
      "SKA_20-PT-0034": {
        "SKA_20-PI-0034": {}
      },
      "SKA_20-V-0685": {},
      "SKA_20-V-0551": {}
    }
  }
}

If you need help converting this structure into the exact one you asked for, let me know.

CodePudding user response:

The package "extradict" has a NestedData class which can arrange for this. Unfortunatelly, as of now, the path separator is hard-coded to a dot (.) , meaning you have to replace the / in your data before feeding it (and also, strip the leading /).

After that, I think it can do everything you need. It will be simplear if your leaf paths are also "paths" but mapping to None - but if you need then to be simple strings, that could also be done.

All in all, the code is this:

from extradict import NestedData

data = ['/SKA_20-VA-001/SKA_20-V-0546',
 '/SKA_20-VA-001/SKA_20-V-0148',
 '/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028A',
 '/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028B',
 '/SKA_20-VA-001/SKA_20-PT-0034/SKA_20-PI-0034',
 '/SKA_20-VA-001/SKA_20-V-0685',
 '/SKA_20-VA-001/SKA_20-V-0551']

d1 = [d.replace("/", ".") for d in data]
d2 = NestedData(dict.fromkeys(d1))

print(d2)

will print:

{'SKA_20-VA-001': {
    'SKA_20-V-0546': <NoneType>,
    'SKA_20-V-0148': <NoneType>,
    'SKA_20-LT-0028A': {
        'SKA_20-LI-0028A': <NoneType>,
        'SKA_20-LI-0028B': <NoneType>},
    'SKA_20-PT-0034': {'SKA_20-PI-0034': <NoneType>},
    'SKA_20-V-0685': <NoneType>,
    'SKA_20-V-0551': <NoneType>}}

And if you need the data structure as plain dicts, just use the .data attribute on d2:

In [33]: print(d2.data)
{'SKA_20-VA-001': {'SKA_20-V-0546': None, 'SKA_20-V-0148': None, 'SKA_20-LT-0028A': {'SKA_20-LI-0028A': None, 'SKA_20-LI-0028B': None}, 'SKA_20-PT-0034': {'SKA_20-PI-0034': None}, 'SKA_20-V-0685': None, 'SKA_20-V-0551': None}}

Disclaimer: I am the author of extradict. You can install it with pip install extradict - current version is 0.6

  • Related