I have lists where each entry is representing a nested structure, where /
represents each level in the structure.
['a','a/b/a','a/b','a/b/d',....]
I want to take such a list and return an index list where each level is sorted in alphabetical order.
If we had the following list
['a','a/b','a/b/a','a/c','a/c/a','b']
It represents the nested structure
'a': #1
'b': #1.1
'a': ... #1.1.1
'c': #1.2
'a': ... #1.2.1
'b' : ... #2
I am trying to get the output
['1','1.1','1.1.1', '1.2','1.2.1','2']
But I am having real issue on how to tackle the problem, would it be solved recursively? Or what would be a way to solve this for any generic list where each level is separated by /
? The list is originally not necessarily sorted, and each level can be any generic word.
CodePudding user response:
Here's what I have tried:
from operator import itemgetter
from functools import reduce
lst = ['a','a/b','a/b/a','a/c','a/c/a','b']
# build a dict
dct = {}
for keys in lst:
reduce(lambda d, k: d.setdefault(k, {}), keys.split('/'), dct)
print(dct) # {'a': {'b': {'a': {}}, 'c': {'a': {}}}, 'b': {}}
def read_dct(dct, prefix=None):
if not dct: # empty dict, i.e., at a leaf
return
sorted_items = sorted(dct.items(), key=itemgetter(0)) # sort on keys
for i, (_, v) in enumerate(sorted_items, start=1):
yield (current := f"{prefix}.{i}" if prefix else str(i))
yield from read_dct(v, current)
print([*read_dct(dct)]) # ['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']
Basically, the first part builds a dictionary to represent the tree structure. And then I use a recursive function to make a list from the dict.
CodePudding user response:
Here's a similar solution to the accepted answer, but I think it might be more correct than that answer. If I understand the problem correctly, there should be exactly one value in the output list for each value in the input list. A input of ['a/b/c/d']
should result in ['1.1.1.1']
, not in a list with four values.
Anyway, here's my solution, with a couple of extra test cases:
def doit(inp):
def recursive_print(struct, sum=""):
if sum and struct[1]:
print(sum)
for i, key in enumerate(sorted(struct[0].keys())):
recursive_print(struct[0][key], sum ("." if sum else "") str(i 1))
struct = [{}, False]
for v in inp:
p = last = struct
for part in v.split('/'):
if part not in p[0]:
p[0][part] = [{}, False]
p = p[0][part]
p[1] = True
recursive_print(struct)
inp = ['a','a/b','a/b/a','a/c','a/c/a','b']
doit(inp)
print()
inp = ['a/b/c/d']
doit(inp)
print()
inp = ['joe','joe/sam','larry/curly/moe','jerry/jill','jerry/jill/tom','jerry/jill/tony','alice/jill/betty/davis/eyes','john']
doit(inp)
Result:
1
1.1
1.1.1
1.2
1.2.1
2
1.1.1.1
1.1.1.1.1
2.1
2.1.1
2.1.2
3
3.1
4
5.1.1
CodePudding user response:
Since the goal is to simply convert the paths to indices according to their respective positions against other paths of the same prefix, there is no need to build a tree at all. Instead, iterate over the paths in alphabetical order while using a dict of sets to keep track of the prefixes at each level of paths, and join the lengths of sets at each level for output:
def indices(paths):
output = {}
names = {}
for index, path in sorted(enumerate(paths), key=lambda t: t[1]):
counts = []
prefixes = tuple(path.split('/'))
for level, name in enumerate(prefixes):
prefix = prefixes[:level]
names.setdefault(prefix, set()).add(name)
counts.append(len(names[prefix]))
output[index] = '.'.join(map(str, counts))
return list(map(output.get, range(len(output))))
so that:
print(indices(['a', 'a/b', 'a/b/a', 'a/c', 'a/c/a', 'b']))
print(indices(['a', 'c', 'b', 'a/b']))
print(indices(['a/b/c/d', 'a/b/d', 'a/b/c']))
print(indices(['abc/d', 'bcc/d']))
print(indices(['apple/cat','apple/dog', 'banana/dog']))
outputs:
['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']
['1', '3', '2', '1.1']
['1.1.1.1', '1.1.2', '1.1.1']
['1.1', '2.1']
['1.1', '1.2', '2.1']
Demo: https://replit.com/@blhsing/StainedMassivePi
CodePudding user response:
Since the parts of the string separated by /
can presumably have different lengths, you can't just sort the strings directly. However, by splitting the strings over the /
, you can get tuples, which you can sort directly in the way you want:
strings = ['a','a/b/a','a/b','a/b/d', 'b/a', 'b']
keys = sorted(map(lambda s: s.split('/'), strings))
print(keys)
Output:
[['a'], ['a', 'b'], ['a', 'b', 'a'], ['a', 'b', 'd'], ['b'], ['b', 'a']]