From my own experience, I noticed that the accuracy of a classification model inversely changed with the number of classes in the target variable. That is, the more number of classes in the dependent variable, the lower the model's accuracy. I don't know whether that change was caused by the number of classes or by the imbalances between them (although oversampling technique did help improve the model's performance a bit). I assume that because a larger number of classes leads to a smaller difference of probabilities between them, thus it is harder for a model to "confidently" determine the exact class.
Is there a more concrete theoretical basis to explain the observation above?
CodePudding user response:
The easiest way is to understand what accuracy "means" wrt. number of classes by consideringa random baseline. Tossing a coin gievs you 1/K accuracy where K is number of classes. So 50% for 2 classes, but just 10% for 10, and just 1% for 100. This shows that "60%" accuracy "means more" if you have more classes, a binary classifier with 60% accuracy is almost random, but 60% for 100 classes is godlike.