This thread covers some of the nuances about CTC Loss and its unique way of capturing repeated characters and blanks in a sequence: CTC: What is the difference between space and blank? but its practical implementation is unclear.
Lets say that I am trying predict these two sequences that correspond to two pictures.
seq_list = ['pizza', 'a pizza']
and I map their characters to integers for the model with something like:
mapping = {'p': 0,
'i': 1,
'z': 2,
'a': 3,
'blank': 4}
What do the individual labels look like?
pizza_label = [0, 1, 2, 4, 3] # pizza
a_pizza_label = [3, 0, 1, 2, 4, 3] # a pizza
Then, what about combining them so the shape of the labels are the same for the model? Do we use blank for padding?
pizza_label = [0, 1, 2, 4, 3, 4] # pizza
a_pizza_label = [3, 0, 1, 2, 4, 3] # a pizza
CodePudding user response:
Padding: you have to pad the smaller image so it has the same width as the larger image. Only this way you can put it into one batch. Simply use black background.
Labels: you do not have to take care of the CTC blank yourself. It is enough to translate the char sequence to a sequence of labels (integers), e.g. [mapping[c] for c in 'pizza']
in . The CTC loss function takes care of handling the CTC blank.