My response variable, status
has two values, 1 for alive, 0 for dead.
I have built a model such as this one model<- glm(status ~., train_data, family='binomial')
. I use predict(model, test_data, type = 'response')
, which gives me a vector of predicted probabilities, like this:
0.02 0.04 0.1
Are these probabilities of someone being alive (i.e. status == 1
) or someone being dead (i.e. status == 0
)?
I'm pretty sure it's probabilities of someone being alive, but is this always the case? Is there a way to specify this directly in the predict()
function?
CodePudding user response:
From ?binomial
:
For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:
- As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
- As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
- As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
If status
is numeric with values of 0 or 1, the "total number of cases" is assumed to be 1 (i.e., each observation is the failure (0) or success (1) of a single individual). (The probability is always "probability of 1", i.e. 0 always means "failure" and 1 always means "success".)
There is no way to change this in predict()
, as far as I know: if you wanted to flip the probabilities you would need to use 1-status
rather than status
as your response variable.