Is it possible to only use K-1 logits for K-class classification?-CodePudding

For multi-class classification, we use softmax function to calculate the probability.

In the case of case = 2, we have softmax(a)_0 = e^a_0/(e^a_0 e^a_1) = 1/(1 e^(a_1 - a_0) = sigmoid(a_0 - a_1), which we reduce softmax to logistic, and we only use 1 logit.

I'm wondering if it's possible to only use K-1 logits to model the multi-class classification problem, when we have K class?

CodePudding user response：

The question is essentially equiavalent to asking "is there a surjective (preferably bijective) function from R^{n-1} to n-simplex" and the answer is of course positive. Some examples:

1. f([x1, ..., xn-1]) = softmax([x1, ..., xn-1, 0])
2. f([x1, ..., xn-1]) = [sigmoid(x1), (1-sigmoid(x1)) * softmax([x2, ..., xn-1])]

In general these will often introduce some arbitrary assymetry to your formulation which due to Okham's razor is something we usually avoid.

Note, that

softmax([-x, 0]) = [e^{-x}/(e^{-x}   e^0), 1/(e^{-x}   1)] 
                 = [1-sigmoid(x), sigmoid(x)]

So in a sense solution (1) is a generalisation of what you do with sigmoid in K=2 case to the K>2 case. Unfortunately you have to arbitrary pick which of the dimensions you wil substitute with 0.