Manually get the responses from GLM with gamma distribution and a GLM with inverse guassian distribu-CodePudding

I've been trying to manually get the response values given by the predict.glm function from the stats package in R. However, I'm unable to do so. I only know how to manually get the value with a binomial distribution. I would really appreciate some help. I created two small models (one with Gamma family and one with inverse Gaussian family).

library(stats)
library(dplyr)

data("USArrests")

#Gamma distribution model
model_gam <- glm(Rape~Murder   Assault   UrbanPop, data=USArrests, family=Gamma)

print(summary(model_gam))
responses_gam <- model_gam %>% predict(USArrests[1,], type="response")
print(responses_gam)

#Trying to manually get responses for gamma model
paste(coef(model_gam), names(coef(model_gam)), sep="*", collapse=" ")
# "0.108221470842499*(Intercept) -0.00122165587689519*Murder -9.47425665022909e-05*Assault -0.000467789606041651*UrbanPop"
print(USArrests[1,])
#Murder: 13.2, Assault: 236, UrbanPop: 58
x = 0.108221470842499 - 0.00122165587689519 * 13.2 - 9.47425665022909e-05 * 236 - 0.000467789606041651 * 58
# This is wrong. Do I have to include the dispersion? (which is 0.10609)
print (exp(x)/(1 exp(x)))
# result should be (from predict function): 26.02872 
# exp(x)/(1 exp(x)) gives: 0.510649


# Gaussian distribution model

model_gaus <- glm(Rape~Murder   Assault   UrbanPop, data=USArrests, family=inverse.gaussian(link="log"))

responses_gaus <- model_gaus %>% predict(USArrests[1,], type="response")
print(summary(model_gaus))
print(responses_gaus)

#Trying to manually get responses for gaussian model
paste(coef(model_gaus), names(coef(model_gaus)), sep="*", collapse=" ")
# "0.108221470842499*(Intercept) -0.00122165587689519*Murder -9.47425665022909e-05*Assault -0.000467789606041651*UrbanPop"
x =  1.70049202188329-0.0326196928618521* 13.2 -0.00234379099421488*236-0.00991369000675323*58
# Dispersion in this case is 0.004390825
print(exp(x)/(1 exp(x)))
# result should be (from predict function): 26.02872 
# exp(x)/(1 exp(x)) it is: 0.5353866

CodePudding user response：

built-in `predict()`

predict(model_gaus)["Alabama"] ## 3.259201

by hand

cat(paste(round(coef(model_gaus),5), names(coef(model_gaus)), sep="*", collapse=" "),"\n")
## 1.70049*(Intercept) 0.03262*Murder 0.00234*Assault 0.00991*UrbanPop
USArrests["Albama",]
##         Murder Assault UrbanPop Rape
## Alabama   13.2     236       58 21.2

The value of the intercept is always 1, so we have

1.70049*1 0.03262*13.2 0.00234*236 0.00991*58 
## [1] 3.258094

(close enough, since I rounded some things)

You don't need to do anything with the dispersion or the inverse-link function, since the Gaussian model uses an identity link.

using the model matrix

Mathematically, the regression equation is defined as X %*% beta where beta is the vector of coefficients and X is the model matrix (for your example, it's a column of ones for the intercept plus your predictors; for models with categorical predictors or more complex terms like splines, it's a little more complicated). You can extract the model matrix from the matrix with model.matrix():

Xg <- model.matrix(model_gaus)
drop(Xg["Alabama",] %*% coef(model_gaus))

For the Gamma model, you would use exactly the same procedure, but at the end you would transform the linear expression you computed (the linear predictor) by 1/x (the inverse link function for the Gamma). (Note that you need predict(..., type = "response") to get the inverse-transformed prediction; otherwise [default type = "link"] R will give you just the plain linear expression.] If you used a log link instead you would exponentiate. More generally,

invlinkfun <- family(fitted_model)$linkinv
X <- model.matrix(fitted_model)
beta <- coef(fitted_model)
invlinkfun(X %*% beta)

The inverse Gaussian model uses a 1/mu^2 link by default; inverse.gaussian()$linkinv is function(eta) { 1/sqrt(eta) }

built-in predict()

by hand

using the model matrix

built-in `predict()`