Home > Software engineering >  Are predicted probabilities from glm models probabilities of 0 or 1?
Are predicted probabilities from glm models probabilities of 0 or 1?

Time:11-18

My response variable, status has two values, 1 for alive, 0 for dead.

I have built a model such as this one model<- glm(status ~., train_data, family='binomial'). I use predict(model, test_data, type = 'response'), which gives me a vector of predicted probabilities, like this:

0.02  0.04  0.1

Are these probabilities of someone being alive (i.e. status == 1) or someone being dead (i.e. status == 0)?

I'm pretty sure it's probabilities of someone being alive, but is this always the case? Is there a way to specify this directly in the predict() function?

CodePudding user response:

From ?binomial:

For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:

  1. As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
  1. As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
  1. As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

If status is numeric with values of 0 or 1, the "total number of cases" is assumed to be 1 (i.e., each observation is the failure (0) or success (1) of a single individual). (The probability is always "probability of 1", i.e. 0 always means "failure" and 1 always means "success".)

There is no way to change this in predict(), as far as I know: if you wanted to flip the probabilities you would need to use 1-status rather than status as your response variable.

  •  Tags:  
  • r glm
  • Related