How to interpret glmer output with a 3 level factor encoded as contrast sum-CodePudding

I have a question with regard to interpretations of glmer models:

I've fitted a model with 3 predictors: PA, PB and PC. PA and PB have two levels, PC (which is the position of the verb) has 3 levels (verbposition 1, verbposition 2, verbposition 3).

The predictors were encoded as sum contrasts using contr.sum in R:

constrasts(data$PA) = contr.sum(levels(data$PA) 
constrasts(data$PB) = contr.sum(levels(data$PB) 
constrasts(data$PC) = contr.sum(levels(data$PC)

When I was running the model, the results show various significant main effects and interactions, also with PC. But I don't understand how to interpret the levels of PC: PC1 and PC2.

I've checked the levels of PC with contr.sum(levels(data$PC) and got the following matrix: 1: 1 0 -1 ; 2: 0 1 -1

But what exactly tells this? As far as I understand, PC1 contains the vector 1,0,-1 and PC2 the vector 0,1,-1. It seems that level 3 is the reference - is that correct? And if this is the case, how can I interpret the main effects of PC1 and PC2? How do they refer to verbposition 1, 2 or 3?

Any help is greatly appreciated.

CodePudding user response：

Here's an example with the mtcars data.

library(dplyr)
data(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
contrasts(mtcars$cyl) <- contr.sum(n=c("4", "6", "8"))

contrasts(mtcars$cyl) 
#>   [,1] [,2]
#> 4    1    0
#> 6    0    1
#> 8   -1   -1

What the contrasts statement says is that the coefficient for the first regressor cyl1 is for cyl = 4, the second is for cyl = 6 and cyl = 8 is represented by the sum of the negative effects for the other two categories. The reference here is not any one category, but the mean of all the group means: Below is all of the group means and then the mean of the group means.

mtcars %>% 
  group_by(cyl) %>% 
  summarise(mpg = mean(mpg))
#> # A tibble: 3 × 2
#>   cyl     mpg
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

mtcars %>% 
  group_by(cyl) %>% 
  summarise(mpg = mean(mpg)) %>% 
  select(mpg) %>% 
  pull %>% 
  mean
#> [1] 20.50216

In the model summary, you can see that the group intercept is the mean of the means. The mean of mpg for cyl = 4 is 26.7 according to the output above. Adding the cyl1 coefficient to the intercept gets: 20.5 6.2=26.7. The mean for the cyl=6 group is 19.7. We could get this by adding the cyl2 coefficient to the intercept: 20.5 - .8 = 19.7. Finally the mean of mpg for the cyl=8 group is 15.1. We get this with: 20.5 -6.2 -(-.8)=15.1. So, the way we interpret the coefficients below is that four cylinder cars have expected miles per gallon about 6 higher than the mean of all the means. Six cylinder cars have expected miles per gallon around .8 smaller than the mean of all the means. Finally, eight cylinder cars have expected miles per gallon roughly -6.2 .8 or 5.4 lower than the mean of all the group means.

mod2 <- lm(mpg ~ cyl, data=mtcars)
summary(mod2)
#> 
#> Call:
#> lm(formula = mpg ~ cyl, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.2636 -1.8357  0.0286  1.3893  7.2364 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  20.5022     0.5935  34.543  < 2e-16 ***
#> cyl1          6.1615     0.8167   7.544 2.57e-08 ***
#> cyl2         -0.7593     0.9203  -0.825    0.416    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.223 on 29 degrees of freedom
#> Multiple R-squared:  0.7325, Adjusted R-squared:  0.714 
#> F-statistic:  39.7 on 2 and 29 DF,  p-value: 4.979e-09

^{Created on 2022-11-02 by the reprex package (v2.0.1)}

Testing the Effect of the omitted group.

Since we know the omitted group's coefficient is -cyl1 -cyl2 = -cyl1 - cyl2, we could use the linearHypothesis() function from the car package to test whether this is significantly different from the mean of means:

car::linearHypothesis(mod2, "-cyl1 - cyl2")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> - cyl1 - cyl2 = 0
#> 
#> Model 1: restricted model
#> Model 2: mpg ~ cyl
#> 
#>   Res.Df    RSS Df Sum of Sq     F    Pr(>F)    
#> 1     30 806.86                                 
#> 2     29 301.26  1     505.6 48.67 1.139e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, we see that it is.