binary classification：why use 1/0 as label, what's the difference between 1/-1 or even 100/--CodePudding

In binary classification problem, we usually use 1 for positive label and 0 for negative label. why is that? especially why use 0 rather than -1 for the negative label?

what's the difference between using -1 for negative label, or even more generally, can we use 100 for positive label and -100 for negative label?

CodePudding user response：

As the name suggests (labeling) is used for differentiating the classes. You can use 0/1, 1/-1, cat/dog, etc. (Any name that fits your problem). For example:

If you want to distinguish between cat and dog images, then use cat and dog labels.
If you want to detect spam, then labels will be spam/genuine.

However, because ML algorithms mostly work with numbers before training, labels transform to numeric formats.

CodePudding user response：

Using labels of 0 and 1 arises naturally from some of the historically first methods that have been used for binary classification. E.g. logistic regression models directly the probability of an event happening, event in this case meaning belonging of an object to positive or negative class. When we use training data with labels 0 and 1, it basically means that objects with label 0 have probability of 0 belonging to a given class, and objects with label 1 have probability of 1 belonging to a given class. E.g. for spam classification, emails that are not spam would have label 0, which means they have 0 probability of being a spam, and emails that are spam would have label 1, because their probability of being a spam is 1.

So using labels of 0 and 1 makes perfect sense mathematically. When a binary classifaction model outputs e.g. 0.4 for some input, we can usually interpret this as a probability of belonging to a class 1 (although strictly it's not always the case, as pointed out for example here).

There are classification methods that don't make use of convenient properties of labels 0 and 1, such as support vector machines or linear discriminant analysis, but in their case no other labels would provide more convenience than 0 and 1, so using 0 and 1 is still okay.

Even encoding of classes for multiclass classification makes use of probabilities of belonging to a given class. For example in classification with three classes, objects from the first class would be encoded like [1 0 0], from the second class [0 1 0] and the third class [0 0 1], which again can be interpreted with probabilities. (This is called one-hot encoding). Output of a multiclass classification model is often a vector of form [0.1 0.6 0.3] which can be conveniently intepreted as a vector of class probabilities for given object.