How represent a dictionary of one hot encodded elements?-CodePudding

Probably this question is asked earlier, but I could not find. I want to be able to represent an input element as one hot encoded entity.

For that, do I need to create a dictionary of one hot encoded items? Or, how can I make sure, each time one new-word is coming that will be represented by a correct encoded element without a problem? Do I need to build a dictionary? How I can I handle unknown?

For example,

category = set(["Sweden", "Iceland", "Germany"])


My input: Sweden
Output  : 1, 0, 0

My input: Germany 
Output  : 0, 0, 1

My input: Poland  (unknown)
Output  : 0, 0, 0

Can someone please put some lights on this? Thank you in advance.

CodePudding user response：

I think, simplest was to use torch with a dictionary.

import torch
import torch.nn.functional as F

my_list = ["unknown", "hello", "world", "this", "is", "a", "test"]
print(my_list)

dictionary = {}
for i, element in enumerate(my_list):
    dictionary[element] = i
    
print(dictionary)

num_classes = len(x)   1
F.one_hot(x, num_classes)

# query: world

F.one_hot(torch.tensor(dictionary.get("world", "unknown"), num_classes)

CodePudding user response：

Something like this, one just needs the set of categories as a list:

all_categories = list(set(["Sweden", "Iceland", "Germany"]))
print(all_categories)
# Out: ['Germany', 'Sweden', 'Iceland']

Given categories as a list of unique names:

def hotEncode(cat, all_categories): # assuming all_categories is a list
   r =[0]*len(all_categories) # array of zeros
   if cat in all_categories:
       n = all_categories.index(cat)
       r[n] = 1
   return r

hotEncode("Iceland", all_categories)
# Out: [0, 0, 1]

hotEncode("Poland", all_categories)
# Out: [0, 0, 0]