I am working on a piece of software that clusters images for the user to label. Each iteration the user can merge clusters or rename the label of the clusters and I am looking for an algorithm to map the previous cluster index to its new index based on the previous cluster list and the input cluster list. I am holding the previous cluster's labeled names in a previous_classes list. If the user marks 'Ignore', map new cluster to -1 and remove cluster. Below is the workflow with 4 edge-cases I am looking to account for:
Iteration 1:
Merging ClassC to ClassE
Input:
previous_clusters = ["ClassA", "ClassB", "ClassC", "ClassD", "ClassE"]
clusters = ["ClassA", "ClassB", "ClassE", "ClassD", "ClassE"]
desired output:
{0:0, 1:1, 2:2, 3:3, 4:2}
Iteration 2:
Merging classA to ClassE
Input:
previous_clusters = ["ClassA", "ClassB", "ClassE", "ClassD"]
clusters = ["ClassE", "ClassB", "ClassE", "ClassD"]
desired output:
{0:0, 1:1, 2:0, 3:2}
Iteration 3:
Renaming classB to ClassF gives
Input:
previous_clusters = ["ClassE", "ClassB", "ClassD"]
clusters = ["ClassE", "ClassF", "ClassD"]
desired output:
{0:0, 1:1, 2:2}
Iteration 4
Ignoring ClassE
Input:
previous_clusters = ["ClassE", "ClassF", "ClassD"]
clusters = ["Ignore", "ClassF", "ClassD"]
desired output:
{0:-1, 1:0, 2:1}
previous_clusters = ["ClassF", "ClassD"]
CodePudding user response:
Note that the previous_clusters
is not needed (although it was helpful for me to understand the context). The only information you need is something like "as for index 0, the user selects 'ClassA'
". You are to collect all indices that maps to 'ClassA'
, and then invert the map to get the result (with some work to ensure unique indices for the new classes, and to deal with -1
).
from collections import defaultdict
def recluster(new):
indices_mapped_to = defaultdict(list)
indices_ignored = [] # list of indices to be ignored
for i, new_class in enumerate(new):
if new_class == 'Ignore':
indices_ignored.append(i)
else:
indices_mapped_to[new_class].append(i)
# "invert" the dict
output = {j: i for i, v in enumerate(indices_mapped_to.values()) for j in v}
output.update({j: -1 for j in indices_ignored}) # add the ignored cases
return output
print(recluster(["ClassA", "ClassB", "ClassE", "ClassD", "ClassE"]))
# {0: 0, 1: 1, 2: 2, 4: 2, 3: 3}
print(recluster(["ClassE", "ClassB", "ClassE", "ClassD"]))
# {0: 0, 2: 0, 1: 1, 3: 2}
print(recluster(["ClassE", "ClassF", "ClassD"]))
# {0: 0, 1: 1, 2: 2}
print(recluster(["Ignore", "ClassF", "ClassD"]))
# {1: 0, 2: 1, 0: -1}