I'm working on an OCR use case and have identified common misclassification from the confusion matrix which is for example: '1' being confused for 'J' and '2' being confused with 'Z' and 'J'.
For a given word, I am trying to create a python script which would create all the permutations which account for all the misclassification.
Example:
- Common Misclassifications: {'1':['J'],'2':['Z','J']}
- Input: "AB1CD2"
- Output: AB1CD2, AB1CDZ, ABJCD2, ABJCDZ, AB1CDJ, ABJCDJ
How do I go about solving this?
CodePudding user response:
itertools
product
should help
from itertools import product
misclass = {'1':['J'],'2':['Z','J']}
misclass_items = [tuple([k, *v]) for k, v in misclass.items()]
print(["AB" x "CD" y for (x, y) in list(product(*misclass_items))])
# ['AB1CD2', 'AB1CDZ', 'AB1CDJ', 'ABJCD2', 'ABJCDZ', 'ABJCDJ']
CodePudding user response:
You get a neat solution by using a dictionary of all possible classifications, not just all mis-classifications. That is, you first "enrich" your misclassification dictionary with all possible correct classifications.
from itertools import product
all_characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
common_misclass = {'1':['J'],'2':['Z','J']}
input_string = "AB1CD2"
common_class = {}
for char in all_characters:
if char in common_misclass:
common_class[char] = [char] common_misclass[char]
else:
common_class[char] = [char]
possible_outputs = ["".join(tup) for tup in
product(*[common_class[letter] for letter in input_string])]
print(possible_outputs)