The fitted values returned from speedglm()
look really different from those returned from glm()
and i don't know why. For example, if I run this:
data("lalonde")
glm <- glm(married ~ treat age educ black hisp nodegr, data = lalonde, family = "binomial")
fitted_vals <- glm$fitted.values
I get broadly what i'd expect, which is a fitted value per observation between 0 and 1 (the two possible values of married
). E.g.
skimr::skim(fitted_vals)
── Data Summary ────────────────────────
Values
Name fitted_vals
Number of rows 445
Number of columns 1
_______________________
Column type frequency:
numeric 1
________________________
Group variables None
── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 data 0 1 0.169 0.0913 0.0378 0.105 0.147 0.205 0.627 ▇▅▁▁▁
However, if I run the same model using speedglm()
i get pretty different results:
speedglm <- speedglm(married ~ treat age educ black hisp nodegr, data = lalonde, family = binomial(), fitted = TRUE)
fitted_vals <- speedglm$linear.predictors
skimr::skim(fitted_vals)
── Data Summary ────────────────────────
Values
Name fitted_vals
Number of rows 445
Number of columns 1
_______________________
Column type frequency:
numeric 1
________________________
Group variables None
── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 data 0 1 -1.71 0.606 -3.24 -2.14 -1.76 -1.35 0.521 ▂▇▇▂▁
Does anyone know what's going on here? linear.predictors
seems to be the analogous value to glm
's fitted.values
according to the documentation. It shouldn't, as far as I understand, be possible to get fitted values outside of the range of the dependent variable, but clearly that's what's happening
CodePudding user response:
"Linear predictors" are not the same as "fitted values", unless a GLM is fitted with an identity link. In general the linear predictor is eta = b0 b1*x1 b2*x2 ...
, while the fitted value is mu = linkinv(eta)
, where linkinv
is the inverse link function (e.g. logistic or inverse-logit in this case).
In general it's always safer to use accessor methods: that way you don't have to worry about internal definitions
## fitted values (data scale)
all.equal(fitted(glm), fitted(speedglm)) ## TRUE
## predicted values (linear-predictor scale)
all.equal(predict(glm), predict(speedglm)) ## TRUE
## predict(., type = "response") == fitted(.)
all.equal(predict(glm, type = "response"), fitted(speedglm)) ## TRUE