The following is sample input data (real data may have more levels than shown here, hence why the solution should work for an arbitrary depth):
['/SKA_20-VA-001/SKA_20-V-0546',
'/SKA_20-VA-001/SKA_20-V-0148',
'/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028A',
'/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028B',
'/SKA_20-VA-001/SKA_20-PT-0034/SKA_20-PI-0034',
'/SKA_20-VA-001/SKA_20-V-0685',
'/SKA_20-VA-001/SKA_20-V-0551']
I would like to turn this into a nested dictionary to store the hierarchical structure of the paths, where each parent directory becomes a dict key and all files become elements in a list.
The desired outcome
{
"SKA_20-VA-001": [
"SKA_20-V-0546",
"SKA_20-V-0148",
"SKA_20-V-0685",
"SKA_20-V-0551",
{"SKA_20-LT-0028A": ["SKA_20-LI-0028A", "SKA_20-LI-0028B"]},
{"SKA_20-PT-0034": ["SKA_20-PI-0034"]}
]
}
I have come across this answer, but it does not really solve my question. Or I am not able to modify it correctly.
CodePudding user response:
Such a structure is much easier to build, if you treat files like directories. After building it, you can still convert that into the list/dict combination that you were looking for.
In this case, elements without any children ({}
) are expected to be files:
def add_path(tree, split_path):
subtree = tree.setdefault(split_path[0], {})
if len(split_path) > 1:
add_path(subtree, split_path[1:])
def parse_tree(paths):
tree = {}
for path in paths:
add_path(tree, path.split("/"))
return tree
if __name__ == "__main__":
print(parse_tree(
['/SKA_20-VA-001/SKA_20-V-0546',
'/SKA_20-VA-001/SKA_20-V-0148',
'/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028A',
'/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028B',
'/SKA_20-VA-001/SKA_20-PT-0034/SKA_20-PI-0034',
'/SKA_20-VA-001/SKA_20-V-0685',
'/SKA_20-VA-001/SKA_20-V-0551']
))
results in
{
"": {
"SKA_20-VA-001": {
"SKA_20-V-0546": {},
"SKA_20-V-0148": {},
"SKA_20-LT-0028A": {
"SKA_20-LI-0028A": {},
"SKA_20-LI-0028B": {}
},
"SKA_20-PT-0034": {
"SKA_20-PI-0034": {}
},
"SKA_20-V-0685": {},
"SKA_20-V-0551": {}
}
}
}
If you need help converting this structure into the exact one you asked for, let me know.
CodePudding user response:
The package "extradict" has a NestedData
class which can arrange for this.
Unfortunatelly, as of now, the path separator is hard-coded to a dot (.
) , meaning you have to replace the /
in your data before feeding it (and also, strip the leading /
).
After that, I think it can do everything you need. It will be simplear if your leaf paths are also "paths" but mapping to None
- but if you need then to be simple strings, that could also be done.
All in all, the code is this:
from extradict import NestedData
data = ['/SKA_20-VA-001/SKA_20-V-0546',
'/SKA_20-VA-001/SKA_20-V-0148',
'/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028A',
'/SKA_20-VA-001/SKA_20-LT-0028A/SKA_20-LI-0028B',
'/SKA_20-VA-001/SKA_20-PT-0034/SKA_20-PI-0034',
'/SKA_20-VA-001/SKA_20-V-0685',
'/SKA_20-VA-001/SKA_20-V-0551']
d1 = [d.replace("/", ".") for d in data]
d2 = NestedData(dict.fromkeys(d1))
print(d2)
will print:
{'SKA_20-VA-001': {
'SKA_20-V-0546': <NoneType>,
'SKA_20-V-0148': <NoneType>,
'SKA_20-LT-0028A': {
'SKA_20-LI-0028A': <NoneType>,
'SKA_20-LI-0028B': <NoneType>},
'SKA_20-PT-0034': {'SKA_20-PI-0034': <NoneType>},
'SKA_20-V-0685': <NoneType>,
'SKA_20-V-0551': <NoneType>}}
And if you need the data structure as plain dicts, just use the .data
attribute on d2:
In [33]: print(d2.data)
{'SKA_20-VA-001': {'SKA_20-V-0546': None, 'SKA_20-V-0148': None, 'SKA_20-LT-0028A': {'SKA_20-LI-0028A': None, 'SKA_20-LI-0028B': None}, 'SKA_20-PT-0034': {'SKA_20-PI-0034': None}, 'SKA_20-V-0685': None, 'SKA_20-V-0551': None}}
Disclaimer: I am the author of extradict
. You can install it with pip install extradict
- current version is 0.6