Home > Mobile >  Algorithm for grouping non-transitive pairs of items into maximal (overlapping) subsets
Algorithm for grouping non-transitive pairs of items into maximal (overlapping) subsets

Time:07-09

I'm working on an algorithm to combine matching pairs of items into larger groups. The problem is that these pairs are not transitive; 1=2 and 2=3 does not necessarily mean that 1=3. They are, however, commutative, so 1=2 implies 2=1.

Each item can belong to multiple groups, but each group should be as large as possible; for example, if 1=2, 1=3, 1=4, 1=5, 2=3, 3=4, and 4=5, then we'd want to end up with groups of 1-2-3, 1-3-4, and 1-4-5.

The best solution I've come up with to this so far is to work recursively: for any given item, iterate through every later item, and if they match, recurse and iterate through every later item than that to see if it matches all of the ones you've collected so far. (and then check to make sure there isn't a larger group that already contains that combination, so e.g. in the above example I'd be about to output 4-5 but then would go back and find that they were already incorporated in 1-4-5)

The sets involved are not enormous - rarely more than 40 or 50 items - but I might be working with thousands of these little sets in a single operation. So computational-complexity-wise it's totally fine if it's O(n²) or whatever because it's not going to have to scale to enormous sets, but I'd like it to be as fast as possible on those little 50-item sets.

Anyway, while I can probably make do with the above solution, it feels needlessly awkward and slow, so if there's a better approach I'd love to hear about it.

CodePudding user response:

If you want ALL maximal groups, then there is no subexponential algorithm for this problem. As https://cstheory.stackexchange.com/questions/8390/the-number-of-cliques-in-a-graph-the-moon-and-moser-1965-result points out, the number of maximal cliques to find may itself grow exponentially in the size of the graph.

If you want just a set of maximal groups that covers all of the original relationships, then you can solve this in polynomial time (though not with a great bound).

def maximal_groups (pairs):
    related = {}
    not_included = {}
    for pair in pairs:
        for i in [0, 1]:
            if pair[i] not in related:
                related[pair[i]] = set()
                not_included[pair[i]] = set()
            if pair[1-i] not in related:
                related[pair[1-i]] = set()
                not_included[pair[1-i]] = set()
        related[pair[0]].add(pair[1])
        related[pair[1]].add(pair[0])
        not_included[pair[0]].add(pair[1])
        not_included[pair[1]].add(pair[0])

    groups = []
    for item in sorted(related.keys()):
        while 0 < len(not_included[item]):
            other_item = not_included[item].pop()
            not_included[other_item].remove(item)
            group = [item, other_item]
            available = [x for x in sorted(related[item]) if x in related[other_item]]
            while 0 < len(available):
                next_item = available[0]
                for prev_item in group:
                    if prev_item in not_included[next_item]:
                        not_included[next_item].remove(prev_item)
                        not_included[prev_item].remove(next_item)
                group.append(next_item)
                available = [x for x in available if x in related[next_item]]
            groups.append(group)
    return groups

print(maximal_groups([[1,2], [1,3], [1,4], [1,5], [2,3], [3,4], [4,5]]))
  • Related