I'm looking to sort a list of of strings which illustrate dependencies (the structure of a Bayesian Networks determined through the PC Algorithm).
e.g.
verbose_struct = ['A', 'C|A,E', 'E', 'B|C,D', 'D']
sorted_struct = ['A', 'E', 'D', 'C|A,E', 'B|C,D']
The order of the strings is determined by whether or not the dependencies (the letters following the delimiter '|', e.g. B is dependent on C and D) have been previously listed. As in the above, 'E' should be positioned before 'C|A,E' as C is dependent on E. Strings with no dependencies should be positioned before all strings with dependencies, e.g. 'D' before 'C|A,E' and 'B|C,D'.
How would I go about this?
I have managed to order the strings by whether or not they have dependencies using the following:
sorted_struct = sorted(verbose_struct, key=lambda x: len(x.split('|')))
I am unsure how to then further sort the variables by their dependencies, as I am fairly unfamiliar with lambda functions in Python.
CodePudding user response:
I would just "make the jump" here to making a class
to hold these objects. by doing so, you can implement your own __lt__
method, which is all that is needed for all of the default sorting methods to sort the objects.
Note: This example class just deals with "labels" so when you are adding dependencies, you are just adding labels of other things. A more sophisticated class would hold the Element
values of the dependencies instead of the labels, then you could kinda chain things together by looking up the dependencies of the dependencies, which you cannot do with this simple example. In order to make that work, you would need to define things in some kind of logical order (or do something more complicated) so that the earliest dependencies in the tree were added first. There are probably a bunch of examples out there for node-arc setups (which is what this describes) that do that.
The main thing here is that you can make a custom function to enable sorting by comparing 2 elements
. You could also "peel out" this 2-item comparison function, make the pieces on the fly, and pass that function to the sort()
method and it would also work without the class.
Code:
# Bayesian Elements
class Element():
def __init__(self, data: str):
self.label = data
if '|' in data:
# dependency...break it up
self.base, dependencies = data.split('|')
# break up the dependencies
self.dependencies = set(dependencies.split(','))
else: # it is a single base element
self.base = data
self.dependencies = None
def get_dependencies(self):
return self.dependencies
def get_base(self):
return self.base
def get_label(self):
return self.label
def __str__(self):
return f'base: {self.base}, dep: {self.dependencies}'
def __repr__(self):
return self.label
# in order to compare class elements, you must create a custom "less than" function
# it is the basis of sort
def __lt__(self, other: 'Element'):
# case 1: both are singletons, just alphabetically sort them:
if self.dependencies is None and other.dependencies is None:
return self.label < other.label
# case 2: self has no dependencies, but other does
elif self.dependencies is None and other.dependencies is not None:
return True
# case 3: other has no dependencies, self does
elif other.dependencies is None and self.dependencies is not None:
return False
# case 4: both have dependencies, check if self depends on other:
else:
if self.base in other.dependencies:
return True
elif other.base in self.dependencies:
return False
else: # sort by the base element
return self.base < other.base
data = ['B', 'A', 'C|A,E', 'E', 'B|C,D', 'D']
# make "elements" from the strings
elements = [Element(t) for t in data]
elements.sort()
print(elements)
# example
print(elements[4].get_dependencies())
Output:
[A, B, D, E, C|A,E, B|C,D]
{'A', 'E'}
CodePudding user response:
ideally you should create a dependacy tree and start from adding the leafs and removing the leafs. however for simpe example as yours you could just do a simple queue and start appending in your called "sorted" array
import queue
a = list(filter(lambda x: len(x) == 1, verbose_struct ))
b = {x.split('|')[0]: tuple(x.split('|')[1].split(',')) for x in filter(lambda x: len(x) > 1, verbose_struct )}
nodes = a[:]
q = queue.deque(b.keys())
while q:
cur = q.pop()
if all(map(lambda x: x in nodes, b[cur])):
nodes.append(cur[0])
a.append(f"{cur}|" ",".join(b[cur]))
else:
q.appendleft(cur)
a will be your sorted output