Mapping list changes to their new index Python-CodePudding

I am working on a piece of software that clusters images for the user to label. Each iteration the user can merge clusters or rename the label of the clusters and I am looking for an algorithm to map the previous cluster index to its new index based on the previous cluster list and the input cluster list. I am holding the previous cluster's labeled names in a previous_classes list. If the user marks 'Ignore', map new cluster to -1 and remove cluster. Below is the workflow with 4 edge-cases I am looking to account for:

Iteration 1:

Merging ClassC to ClassE

Input:

previous_clusters = ["ClassA", "ClassB", "ClassC", "ClassD", "ClassE"]
clusters = ["ClassA", "ClassB", "ClassE", "ClassD", "ClassE"]

desired output:

{0:0, 1:1, 2:2, 3:3, 4:2}

Iteration 2:

Merging classA to ClassE

Input:

previous_clusters = ["ClassA", "ClassB", "ClassE", "ClassD"]
clusters = ["ClassE", "ClassB", "ClassE", "ClassD"]

desired output:

{0:0, 1:1, 2:0, 3:2}

Iteration 3:

Renaming classB to ClassF gives

Input:

previous_clusters = ["ClassE", "ClassB", "ClassD"]
clusters = ["ClassE", "ClassF", "ClassD"]

desired output:

{0:0, 1:1, 2:2}

Iteration 4

Ignoring ClassE

Input:

previous_clusters = ["ClassE", "ClassF", "ClassD"]
clusters = ["Ignore", "ClassF", "ClassD"]

desired output:

{0:-1, 1:0, 2:1}

previous_clusters = ["ClassF", "ClassD"]

CodePudding user response：

Note that the previous_clusters is not needed (although it was helpful for me to understand the context). The only information you need is something like "as for index 0, the user selects 'ClassA'". You are to collect all indices that maps to 'ClassA', and then invert the map to get the result (with some work to ensure unique indices for the new classes, and to deal with -1).

from collections import defaultdict

def recluster(new):
    indices_mapped_to = defaultdict(list)
    indices_ignored = [] # list of indices to be ignored

    for i, new_class in enumerate(new):
        if new_class == 'Ignore':
            indices_ignored.append(i)
        else:
            indices_mapped_to[new_class].append(i)

    # "invert" the dict
    output = {j: i for i, v in enumerate(indices_mapped_to.values()) for j in v}
    output.update({j: -1 for j in indices_ignored}) # add the ignored cases

    return output

print(recluster(["ClassA", "ClassB", "ClassE", "ClassD", "ClassE"]))
# {0: 0, 1: 1, 2: 2, 4: 2, 3: 3}
print(recluster(["ClassE", "ClassB", "ClassE", "ClassD"]))
# {0: 0, 2: 0, 1: 1, 3: 2}
print(recluster(["ClassE", "ClassF", "ClassD"]))
# {0: 0, 1: 1, 2: 2}
print(recluster(["Ignore", "ClassF", "ClassD"]))
# {1: 0, 2: 1, 0: -1}