For multi-class classification, we use softmax function to calculate the probability.
In the case of case = 2, we have softmax(a)_0 = e^a_0/(e^a_0 e^a_1) = 1/(1 e^(a_1 - a_0) = sigmoid(a_0 - a_1), which we reduce softmax to logistic, and we only use 1 logit.
I'm wondering if it's possible to only use K-1 logits to model the multi-class classification problem, when we have K class?
CodePudding user response:
The question is essentially equiavalent to asking "is there a surjective (preferably bijective) function from R^{n-1} to n-simplex" and the answer is of course positive. Some examples:
1. f([x1, ..., xn-1]) = softmax([x1, ..., xn-1, 0])
2. f([x1, ..., xn-1]) = [sigmoid(x1), (1-sigmoid(x1)) * softmax([x2, ..., xn-1])]
In general these will often introduce some arbitrary assymetry to your formulation which due to Okham's razor is something we usually avoid.
Note, that
softmax([-x, 0]) = [e^{-x}/(e^{-x} e^0), 1/(e^{-x} 1)]
= [1-sigmoid(x), sigmoid(x)]
So in a sense solution (1) is a generalisation of what you do with sigmoid in K=2 case to the K>2 case. Unfortunately you have to arbitrary pick which of the dimensions you wil substitute with 0.