How do you sort sets in such a way that changes between consecutive sets are minimized according to-CodePudding

To formalize a bit: say we have an ordered list of sets, such that each set in the list is a subset of the set S. Next, we say if a consecutive pair of sets in the ordering has an element included in one set, and not the other, the transition count for the element, for this pair, is 1. Otherwise, this transition count is 0. We will also say that there exists a total ordering T for the elements in set S. How would you go about sorting the list of sets such that the total transition counts (for every consecutive pair in the ordered list of sets) per element is prioritized in the order provided by the ordering of the set S?

For instance, say the sets we want to order are the power set of a given set (all subsets), less an arbitrary number of sets. For my application, this represents generating remaining experiments with n explanatory variables toggled on or off (represented by set inclusion), for which some experiments may already have been conducted. The ordering of the variables used in the sorting is the ordering of how costly it is to toggle a variable, and we are trying to minimize the total transition cost for the set of remaining experiments.

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s) 1))

# fields ordered by decreasing difficulty to swap
difficulty_ordering = ['a', 'b', 'c', 'd']

field_powerset = list(powerset(difficulty_ordering))

[print(subset) for subset in field_powerset]

which would be:

()
('a',)
('b',)
('c',)
('d',)
('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')
('a', 'b', 'c')
('a', 'b', 'd')
('a', 'c', 'd')
('b', 'c', 'd')
('a', 'b', 'c', 'd')

and if we arbitrarily remove some:

field_powerset = random.sample(field_powerset, random.randrange(len(field_powerset)))

we get:

('a', 'b', 'd')
('a', 'd')
('b',)
()
('b', 'c', 'd')
('a', 'b', 'c', 'd')
('a', 'c')
('b', 'c')
('b', 'd')
('c', 'd')
('a', 'b', 'c')
('d',)

now how do we go about getting the sorted result of something roughly similar to this, using difficulty_ordering?

()
('c', 'd')
('d',)
('b',)
('b', 'c')
('b', 'c', 'd')
('b', 'd')
('a', 'b', 'd')
('a', 'b', 'c', 'd')
('a', 'b', 'c')
('a', 'c')
('a', 'd')

# 1 a transition, 2 b transitions, 6 c transitions, 5 d transitions

I wouldn't want to reimplement an entire sort if possible, but I'm unsure if this idea could even be implemented with python's "key function" for sorting.

But after solving the problem manually, it feels like there's a heuristic I can follow to get a good solution:

It seems you need to first "lock in" the ranges of sets, by first minimizing the amount of transitions of the first field. So first you subdivide (subdivisions represented by #) the lists into ones with 'a' and ones without

('b',)
()
('b', 'c', 'd')
('b', 'c')
('b', 'd')
('c', 'd')
('d',)
#
('a', 'b', 'd')
('a', 'd')
('a', 'b', 'c', 'd')
('a', 'c')
('a', 'b', 'c')

and going forward you can only move these elements within their range. then you move on to 'b' and further subdivide these ranges into two with and without, but now it needs to go "without b, with b, with b without b" to minimize transitions fully

()
('c', 'd')
('d',)
#
('b',)
('b', 'c', 'd')
('b', 'c')
('b', 'd')
#
('a', 'b', 'd')
('a', 'b', 'c', 'd')
('a', 'b', 'c')
#
('a', 'd')
('a', 'c')

then c, further flipping with c without c per section

()
('d',)
#
('c', 'd')
#
('b', 'c', 'd')
('b', 'c')
#
('b',)
('b', 'd')
#
('a', 'b', 'd')
#
('a', 'b', 'c', 'd')
('a', 'b', 'c')
#
('a', 'c')
#
('a', 'd')

and last, 'd', but it gets a little different here, as we would think we just swap the order of with/without per subdivision we are subdividing. But at a certain point you might not want to follow this as concretely, in the case of the 3rd subdivision above. If we break our rules a bit we get to a fully optimal solution here:

()
#
('d',)
#
('c', 'd')
#
('b', 'c', 'd')
#
('b', 'c')
#
('b',)
#
('b', 'd')
#
('a', 'b', 'd')
#
('a', 'b', 'c', 'd')
#
('a', 'b', 'c')
#
('a', 'c')
#
('a', 'd')

Choosing not to swap the with/without ordering in that subdivision gets you a lower total cost ordering. So essentially, it seems you need to match the previous set's inclusion of your current element when deciding the with/without ordering of your current subdivision. And this part makes me think doing recursion wouldn't be a great way of solving the problem. But, I'm not sure that I could formulate these thoughts and rules into a python dynamic programming algorithm fully at present.

CodePudding user response：

Your question reduces to the single-source shortest path problem on an undirected graph with natural number weights, which is addressed by Thorup, 1999, and is solvable in O(E) time. Alternatively, if your state transition costs are not natural numbers, then it is addressed by Fredman & Tarjan, 1984 and is solvable in O(E Vlog(V)) time.

Imagine laying out each object in this set as a vertex on a graph. For example, if you have three 'states' (x_1, x_2, x_3), then you have eight underlying configurations represented by the power set:

()
(x_3)
(x_2)
(x_1)
(x_2, x_3)
(x_1, x_3)
(x_1, x_2)
(x_1, x_2, x_3)

These eight configurations represent the nodes on a graph. Imagine restricting the graph to transition cost 1. Then you end up with the following configuration:

() -> (x_1) (x_2) (x_3)
(x_1) -> (x_1, x_2) (x_1, x_3), ()
(x_2) -> (x_1, x_2) (x_2, x_3), ()
(x_3) -> ...

What you would be optimizing for under this situation is the shortest path that traverses each vertex in this configuration. This is a simplified and reduced version of the problem you've presented here.

To complete the problem, imagine the complete graph for this situation, K_3. Under this configuration, you can assign edge weights according to the relative difficulties of toggling each state. For example, if (x_1, x_2, x_3) is ordered by increasing difficulty such that x_2 and x_3 should never override a transition on x_1, then assign by powers of two: x_1 a cost of 4, x_2 a cost of 2, and x_3 a cost of 1, which produces a proper order. How you assign cost when there are multiple states to transition is a design choice dependent on implementation details not shared in your question.

To accommodate configurations that have already been covered, you can simply remove them from the powerset as you've done above, and re-run the shortest path solution on the remaining vertices.

CodePudding user response：

It seems I was able to implement the greedy dynamic programming algorithm I described above, and it works for my purposes. I'm not going to say it's an optimal solution to the problem, and not sure the time complexity (O(m^2*n) maybe? where m is the number of fields and n is the number of sets to order). But it had less cognitive load to implement than mapping the solution to a graph theory problem.

from itertools import chain, combinations
import random


def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s) 1))


def flatten(to_flatten):
    for x in to_flatten:
        if hasattr(x, '__iter__') and not isinstance(x, str) and not isinstance(x, tuple):
            for y in flatten(x):
                yield y
        else:
            yield x


# order a list of tuples such that the transition between elements is minimized
# prioritizing elements in the order of field_order
def minimize_transitions(tuple_list, field_order):
    tuple_lists = [tuple_list]
    # whether the field is included last (true), or not included last (false), in the previous list
    for field in field_order:
        next_tuple_lists = []
        ending_with = False
        for tuple_list in tuple_lists:
            if len(tuple_list) == 1:
                ending_with = field in tuple_list[0]
                next_tuple_lists.append(tuple_list)
            elif len(tuple_list) == 0:
                continue
            else:
                if ending_with:
                    next_tuple_lists.append(list(filter(lambda this_tuple: field in this_tuple, tuple_list)))
                    next_tuple_lists.append(list(filter(lambda this_tuple: field not in this_tuple, tuple_list)))
                else:
                    next_tuple_lists.append(list(filter(lambda this_tuple: field not in this_tuple, tuple_list)))
                    next_tuple_lists.append(list(filter(lambda this_tuple: field in this_tuple, tuple_list)))
                ending_with = not ending_with
        tuple_lists = next_tuple_lists
    return flatten(tuple_lists)


# fields ordered by decreasing difficulty to swap
field_order = ['a', 'b', 'c', 'd', 'e']
field_powerset = list(powerset(field_order))
field_powerset = random.sample(field_powerset, random.randrange(len(field_powerset)))
print("Starting sets:")
[print(subset) for subset in field_powerset]
print()
print("Ordering for minimized transitions:")
field_powerset = minimize_transitions(field_powerset, field_order)
[print(subset) for subset in field_powerset]

Starting sets:
('a', 'c', 'e')
('d', 'e')
('c', 'e')
('a', 'c', 'd')
('a', 'b', 'd', 'e')
('c', 'd', 'e')
('b', 'c', 'd', 'e')
('b', 'd')
('a', 'b', 'c', 'd', 'e')
('a', 'c', 'd', 'e')
('a', 'b', 'd')
('e',)
('a', 'b', 'c')
()
('a', 'b', 'c', 'e')
('a', 'b', 'c', 'd')
('a', 'b', 'e')
('d',)

Ordering for minimized transitions:
()
('e',)
('d', 'e')
('d',)
('c', 'd', 'e')
('c', 'e')
('b', 'c', 'd', 'e')
('b', 'd')
('a', 'b', 'd')
('a', 'b', 'd', 'e')
('a', 'b', 'e')
('a', 'b', 'c', 'e')
('a', 'b', 'c')
('a', 'b', 'c', 'd')
('a', 'b', 'c', 'd', 'e')
('a', 'c', 'd', 'e')
('a', 'c', 'd')
('a', 'c', 'e')