Home > Software engineering >  In python, create index from flat representation of nested structure in a list, sorting by alphabeti
In python, create index from flat representation of nested structure in a list, sorting by alphabeti

Time:10-28

I have lists where each entry is representing a nested structure, where / represents each level in the structure.

['a','a/b/a','a/b','a/b/d',....]

I want to take such a list and return an index list where each level is sorted in alphabetical order.

If we had the following list

['a','a/b','a/b/a','a/c','a/c/a','b']

It represents the nested structure

'a':                   #1

    'b':               #1.1
         'a': ...      #1.1.1
    'c':               #1.2
         'a': ...      #1.2.1
'b' : ...              #2

I am trying to get the output

 ['1','1.1','1.1.1', '1.2','1.2.1','2']

But I am having real issue on how to tackle the problem, would it be solved recursively? Or what would be a way to solve this for any generic list where each level is separated by /? The list is originally not necessarily sorted, and each level can be any generic word.

CodePudding user response:

Here's what I have tried:

from operator import itemgetter
from functools import reduce

lst = ['a','a/b','a/b/a','a/c','a/c/a','b']

# build a dict
dct = {}
for keys in lst:
    reduce(lambda d, k: d.setdefault(k, {}), keys.split('/'), dct)

print(dct) # {'a': {'b': {'a': {}}, 'c': {'a': {}}}, 'b': {}}

def read_dct(dct, prefix=None):
    if not dct: # empty dict, i.e., at a leaf
        return

    sorted_items = sorted(dct.items(), key=itemgetter(0)) # sort on keys
    for i, (_, v) in enumerate(sorted_items, start=1):
        yield (current := f"{prefix}.{i}" if prefix else str(i))
        yield from read_dct(v, current)

print([*read_dct(dct)]) # ['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']

Basically, the first part builds a dictionary to represent the tree structure. And then I use a recursive function to make a list from the dict.

CodePudding user response:

Here's a similar solution to the accepted answer, but I think it might be more correct than that answer. If I understand the problem correctly, there should be exactly one value in the output list for each value in the input list. A input of ['a/b/c/d'] should result in ['1.1.1.1'], not in a list with four values.

Anyway, here's my solution, with a couple of extra test cases:

def doit(inp):

    def recursive_print(struct, sum=""):
        if sum and struct[1]:
            print(sum)
        for i, key in enumerate(sorted(struct[0].keys())):
            recursive_print(struct[0][key], sum   ("." if sum else "")   str(i   1))

    struct = [{}, False]

    for v in inp:
        p = last = struct
        for part in v.split('/'):
            if part not in p[0]:
                p[0][part] = [{}, False]
            p = p[0][part]
        p[1] = True

    recursive_print(struct)

inp = ['a','a/b','a/b/a','a/c','a/c/a','b']
doit(inp)

print()

inp = ['a/b/c/d']
doit(inp)

print()

inp = ['joe','joe/sam','larry/curly/moe','jerry/jill','jerry/jill/tom','jerry/jill/tony','alice/jill/betty/davis/eyes','john']
doit(inp)

Result:

1
1.1
1.1.1
1.2
1.2.1
2

1.1.1.1

1.1.1.1.1
2.1
2.1.1
2.1.2
3
3.1
4
5.1.1

CodePudding user response:

Since the goal is to simply convert the paths to indices according to their respective positions against other paths of the same prefix, there is no need to build a tree at all. Instead, iterate over the paths in alphabetical order while using a dict of sets to keep track of the prefixes at each level of paths, and join the lengths of sets at each level for output:

def indices(paths):
    output = {}
    names = {}
    for index, path in sorted(enumerate(paths), key=lambda t: t[1]):
        counts = []
        prefixes = tuple(path.split('/'))
        for level, name in enumerate(prefixes):
            prefix = prefixes[:level]
            names.setdefault(prefix, set()).add(name)
            counts.append(len(names[prefix]))
        output[index] = '.'.join(map(str, counts))
    return list(map(output.get, range(len(output))))

so that:

print(indices(['a', 'a/b', 'a/b/a', 'a/c', 'a/c/a', 'b']))
print(indices(['a', 'c', 'b', 'a/b']))
print(indices(['a/b/c/d', 'a/b/d', 'a/b/c']))
print(indices(['abc/d', 'bcc/d']))
print(indices(['apple/cat','apple/dog', 'banana/dog']))

outputs:

['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']
['1', '3', '2', '1.1']
['1.1.1.1', '1.1.2', '1.1.1']
['1.1', '2.1']
['1.1', '1.2', '2.1']

Demo: https://replit.com/@blhsing/StainedMassivePi

CodePudding user response:

Since the parts of the string separated by / can presumably have different lengths, you can't just sort the strings directly. However, by splitting the strings over the /, you can get tuples, which you can sort directly in the way you want:

strings = ['a','a/b/a','a/b','a/b/d', 'b/a', 'b']
keys = sorted(map(lambda s: s.split('/'), strings))
print(keys)

Output:

[['a'], ['a', 'b'], ['a', 'b', 'a'], ['a', 'b', 'd'], ['b'], ['b', 'a']]
  • Related