Home > Blockchain >  Is there a way to display the reference category in a regression output in R?
Is there a way to display the reference category in a regression output in R?

Time:12-24

I am estimating a regression model with some factor/categorial variables and some numerical ones. Is it possible to display the reference category for each factor/categorial variable in the summary of the regression model?

Ideally this would translate also to texreg or stargazer to have latex output, but having them in the summary of the regression would already be a good start.

Does anybody have an Idea, what am I missing?

CodePudding user response:

The reference level is the one that is missing in the summary, because the coefficients of the other levels are the contrasts to the reference level, i.e. the intercept actually represents the mean in the reference category.

iris <- transform(iris, Species_=factor(Species))  ## create factor

summary(lm(Sepal.Length ~ Petal.Length   Species_, iris))$coe
#                    Estimate Std. Error   t value      Pr(>|t|)
# (Intercept)         3.6835266 0.10609608 34.718780 1.968671e-72
# Petal.Length        0.9045646 0.06478559 13.962436 1.121002e-28
# Species_versicolor -1.6009717 0.19346616 -8.275203 7.371529e-14
# Species_virginica  -2.1176692 0.27346121 -7.743947 1.480296e-12

You could remove the intercept, to get the missing level displayed, but that makes not much sense. You then just get the means of each level without a reference, however you are interested in the contrast between the reference level and the other levels.

summary(lm(Sepal.Length ~ 0   Petal.Length   Species_, iris))$coe
#                     Estimate Std. Error   t value     Pr(>|t|)
# Petal.Length       0.9045646 0.06478559 13.962436 1.121002e-28
# Species_setosa     3.6835266 0.10609608 34.718780 1.968671e-72
# Species_versicolor 2.0825548 0.28009598  7.435147 8.171219e-12
# Species_virginica  1.5658574 0.36285224  4.315413 2.921850e-05

If you're not sure, the reference level is always the first level of the factor.

levels(iris$Species_)[1]
# [1] "setosa"

To prove that, specify a different reference level and see if it's first.

iris$Species_ <- relevel(iris$Species_, ref='versicolor')

levels(iris$Species_)[1]
# [1] "versicolor"

It is common to refer to the reference level in a note under the table in the report, and I recommend that you do the same.

  • Related