In python, create index from flat representation of nested structure in a list, sorting by alphabeti-CodePudding

I have lists where each entry is representing a nested structure, where / represents each level in the structure.

['a','a/b/a','a/b','a/b/d',....]

I want to take such a list and return an index list where each level is sorted in alphabetical order.

If we had the following list

['a','a/b','a/b/a','a/c','a/c/a','b']

It represents the nested structure

'a':                   #1

    'b':               #1.1
         'a': ...      #1.1.1
    'c':               #1.2
         'a': ...      #1.2.1
'b' : ...              #2

I am trying to get the output

 ['1','1.1','1.1.1', '1.2','1.2.1','2']

But I am having real issue on how to tackle the problem, would it be solved recursively? Or what would be a way to solve this for any generic list where each level is separated by /? The list is originally not necessarily sorted, and each level can be any generic word.

CodePudding user response：

Here's what I have tried:

from operator import itemgetter
from functools import reduce

lst = ['a','a/b','a/b/a','a/c','a/c/a','b']

# build a dict
dct = {}
for keys in lst:
    reduce(lambda d, k: d.setdefault(k, {}), keys.split('/'), dct)

print(dct) # {'a': {'b': {'a': {}}, 'c': {'a': {}}}, 'b': {}}

def read_dct(dct, prefix=None):
    if not dct: # empty dict, i.e., at a leaf
        return

    sorted_items = sorted(dct.items(), key=itemgetter(0)) # sort on keys
    for i, (_, v) in enumerate(sorted_items, start=1):
        yield (current := f"{prefix}.{i}" if prefix else str(i))
        yield from read_dct(v, current)

print([*read_dct(dct)]) # ['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']

Basically, the first part builds a dictionary to represent the tree structure. And then I use a recursive function to make a list from the dict.

CodePudding user response：

Here's a similar solution to the accepted answer, but I think it might be more correct than that answer. If I understand the problem correctly, there should be exactly one value in the output list for each value in the input list. A input of ['a/b/c/d'] should result in ['1.1.1.1'], not in a list with four values.

Anyway, here's my solution, with a couple of extra test cases:

def doit(inp):

    def recursive_print(struct, sum=""):
        if sum and struct[1]:
            print(sum)
        for i, key in enumerate(sorted(struct[0].keys())):
            recursive_print(struct[0][key], sum   ("." if sum else "")   str(i   1))

    struct = [{}, False]

    for v in inp:
        p = last = struct
        for part in v.split('/'):
            if part not in p[0]:
                p[0][part] = [{}, False]
            p = p[0][part]
        p[1] = True

    recursive_print(struct)

inp = ['a','a/b','a/b/a','a/c','a/c/a','b']
doit(inp)

print()

inp = ['a/b/c/d']
doit(inp)

print()

inp = ['joe','joe/sam','larry/curly/moe','jerry/jill','jerry/jill/tom','jerry/jill/tony','alice/jill/betty/davis/eyes','john']
doit(inp)

Result:

1
1.1
1.1.1
1.2
1.2.1
2

1.1.1.1

1.1.1.1.1
2.1
2.1.1
2.1.2
3
3.1
4
5.1.1

CodePudding user response：

Since the goal is to simply convert the paths to indices according to their respective positions against other paths of the same prefix, there is no need to build a tree at all. Instead, iterate over the paths in alphabetical order while using a dict of sets to keep track of the prefixes at each level of paths, and join the lengths of sets at each level for output:

def indices(paths):
    output = {}
    names = {}
    for index, path in sorted(enumerate(paths), key=lambda t: t[1]):
        counts = []
        prefixes = tuple(path.split('/'))
        for level, name in enumerate(prefixes):
            prefix = prefixes[:level]
            names.setdefault(prefix, set()).add(name)
            counts.append(len(names[prefix]))
        output[index] = '.'.join(map(str, counts))
    return list(map(output.get, range(len(output))))

so that:

print(indices(['a', 'a/b', 'a/b/a', 'a/c', 'a/c/a', 'b']))
print(indices(['a', 'c', 'b', 'a/b']))
print(indices(['a/b/c/d', 'a/b/d', 'a/b/c']))
print(indices(['abc/d', 'bcc/d']))
print(indices(['apple/cat','apple/dog', 'banana/dog']))

outputs:

['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']
['1', '3', '2', '1.1']
['1.1.1.1', '1.1.2', '1.1.1']
['1.1', '2.1']
['1.1', '1.2', '2.1']

Demo: https://replit.com/@blhsing/StainedMassivePi

CodePudding user response：

Since the parts of the string separated by / can presumably have different lengths, you can't just sort the strings directly. However, by splitting the strings over the /, you can get tuples, which you can sort directly in the way you want:

strings = ['a','a/b/a','a/b','a/b/d', 'b/a', 'b']
keys = sorted(map(lambda s: s.split('/'), strings))
print(keys)

Output:

[['a'], ['a', 'b'], ['a', 'b', 'a'], ['a', 'b', 'd'], ['b'], ['b', 'a']]