Home > front end >  Sorting list of strings based on both sides of a delimiter ("|") in Python 3
Sorting list of strings based on both sides of a delimiter ("|") in Python 3

Time:11-19

I'm looking to sort a list of of strings which illustrate dependencies (the structure of a Bayesian Networks determined through the PC Algorithm).

e.g.

verbose_struct = ['A', 'C|A,E', 'E', 'B|C,D', 'D']
sorted_struct = ['A', 'E', 'D', 'C|A,E', 'B|C,D']

The order of the strings is determined by whether or not the dependencies (the letters following the delimiter '|', e.g. B is dependent on C and D) have been previously listed. As in the above, 'E' should be positioned before 'C|A,E' as C is dependent on E. Strings with no dependencies should be positioned before all strings with dependencies, e.g. 'D' before 'C|A,E' and 'B|C,D'.

How would I go about this?

I have managed to order the strings by whether or not they have dependencies using the following:

sorted_struct = sorted(verbose_struct, key=lambda x: len(x.split('|')))

I am unsure how to then further sort the variables by their dependencies, as I am fairly unfamiliar with lambda functions in Python.

CodePudding user response:

I would just "make the jump" here to making a class to hold these objects. by doing so, you can implement your own __lt__ method, which is all that is needed for all of the default sorting methods to sort the objects.

Note: This example class just deals with "labels" so when you are adding dependencies, you are just adding labels of other things. A more sophisticated class would hold the Element values of the dependencies instead of the labels, then you could kinda chain things together by looking up the dependencies of the dependencies, which you cannot do with this simple example. In order to make that work, you would need to define things in some kind of logical order (or do something more complicated) so that the earliest dependencies in the tree were added first. There are probably a bunch of examples out there for node-arc setups (which is what this describes) that do that.

The main thing here is that you can make a custom function to enable sorting by comparing 2 elements. You could also "peel out" this 2-item comparison function, make the pieces on the fly, and pass that function to the sort() method and it would also work without the class.

Code:

# Bayesian Elements

class Element():

    def __init__(self, data: str):
        self.label = data
        if '|' in data:
            # dependency...break it up
            self.base, dependencies = data.split('|')
            # break up the dependencies
            self.dependencies = set(dependencies.split(','))
        else:  # it is a single base element
            self.base = data
            self.dependencies = None

    def get_dependencies(self):
        return self.dependencies

    def get_base(self):
        return self.base

    def get_label(self):
        return self.label

    def __str__(self):
        return f'base: {self.base}, dep: {self.dependencies}'

    def __repr__(self):
        return self.label

    # in order to compare class elements, you must create a custom "less than" function
    # it is the basis of sort
    def __lt__(self, other: 'Element'):
        # case 1:  both are singletons, just alphabetically sort them:
        if self.dependencies is None and other.dependencies is None:
            return self.label < other.label
        # case 2:  self has no dependencies, but other does
        elif self.dependencies is None and other.dependencies is not None:
            return True
        # case 3:  other has no dependencies, self does
        elif other.dependencies is None and self.dependencies is not None:
            return False
        # case 4:  both have dependencies, check if self depends on other:
        else:
            if self.base in other.dependencies:
                return True
            elif other.base in self.dependencies:
                return False
            else:  # sort by the base element
                return self.base < other.base

data = ['B', 'A', 'C|A,E', 'E', 'B|C,D', 'D']

# make "elements" from the strings
elements = [Element(t) for t in data]

elements.sort()

print(elements)

# example
print(elements[4].get_dependencies())

Output:

[A, B, D, E, C|A,E, B|C,D]
{'A', 'E'}

CodePudding user response:

ideally you should create a dependacy tree and start from adding the leafs and removing the leafs. however for simpe example as yours you could just do a simple queue and start appending in your called "sorted" array

import queue

a = list(filter(lambda x: len(x) == 1, verbose_struct ))
b = {x.split('|')[0]: tuple(x.split('|')[1].split(',')) for x in  filter(lambda x: len(x) > 1, verbose_struct )}
nodes = a[:]
q = queue.deque(b.keys())
while q:
    cur = q.pop()
    if all(map(lambda x: x in nodes, b[cur])):
        nodes.append(cur[0])
        a.append(f"{cur}|"    ",".join(b[cur]))
    else:
        q.appendleft(cur)

a will be your sorted output

  • Related