Home > OS >  R glm summary lists every value of independent variable
R glm summary lists every value of independent variable

Time:11-26

I'm running a glm in r on a dataframe with 2 values.

str(INV)
'data.frame':   5614 obs. of  2 variables:
 $ MSACode: Factor w/ 70 levels "40","80","440",..: 37 64 58 56 66 14 38 37 66 14 ...
 $ NotPaid: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

The code I used to run it:

GlmModel <- glm(NotPaid ~ MSACode,family=binomial(link="logit"),data=training)
print(summary(GlmModel))

The result from the summary is showing the individual values rather than just one value for the field.

> print(summary(GlmModel))

Call:
glm(formula = NotPaid ~ MSACode, family = binomial(link = "logit"), 
    data = training)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9728  -0.8352  -0.6501   0.9346   2.8245  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.657e 01  1.697e 03  -0.010    0.992
MSACode80    1.462e 01  1.697e 03   0.009    0.993
MSACode440  -7.494e-07  1.924e 03   0.000    1.000
MSACode520   1.547e 01  1.697e 03   0.009    0.993
MSACode640   1.587e 01  1.697e 03   0.009    0.993
MSACode720   1.477e 01  1.697e 03   0.009    0.993
MSACode870   1.657e 01  1.697e 03   0.010    0.992
MSACode1080  1.455e 01  1.697e 03   0.009    0.993

I don't understand these results - why is it showing each MSACode value separately? Thanks.

CodePudding user response:

I'm sure this is a duplicate, but can't find it.

The problem is that, because MSACode is a factor (possibly because of a value in that column of an input file that couldn't be interpreted as numeric), R is assuming you want to treat it as a categorical rather than as a continuous predictor — hence, it gives you n-1 parameters (where n is the number of levels) rather than 1 to describe its effect. You can convert it back to numeric by:

INV <- transform(INV, 
    MSACode = as.numeric(as.character(MSACode)))

and then re-run your model. (This post explains why we need as.numeric(as.character(.)) rather than as.numeric(), and explains that as.numeric(levels(f))[f] is more efficient — although I rarely bother worrying about that level of efficiency ...)

  • Related