Home > Net >  How are fitted values being calculated in simple linear model of factors in R - specific example
How are fitted values being calculated in simple linear model of factors in R - specific example

Time:04-08

I created a simple linear model in R using factors as my predictors, but I'm having issues trying to reconcile how the fitted values are being calculated.

This is the linear model:

> summary(linear_mod_3)

Call:
lm.default(formula = sqrt(sales_price) ~ A   B   C   A:B   B:C   
    block   replicate, data = fac_combo_full)

Residuals:
    Min      1Q  Median      3Q     Max 
-91.655 -23.715   6.383  24.044  81.514 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  487.431     11.760  41.449  < 2e-16 ***
A            -38.158      5.880  -6.490 1.08e-07 ***
B            -20.862      5.880  -3.548  0.00103 ** 
C             67.818      5.880  11.534 3.87e-14 ***
block1        -5.559     11.760  -0.473  0.63904    
replicate2   -16.668     14.403  -1.157  0.25420    
replicate3   -45.968     14.403  -3.192  0.00279 ** 
A:B           28.891      5.880   4.913 1.65e-05 ***
B:C          -43.162      5.880  -7.340 7.34e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 40.74 on 39 degrees of freedom
Multiple R-squared:  0.8764,    Adjusted R-squared:  0.851 
F-statistic: 34.55 on 8 and 39 DF,  p-value: 2.252e-15

And this is the model matrix being applied to it:

> model.matrix(linear_mod_3)
   (Intercept)  A  B  C block1 replicate2 replicate3 A:B B:C
1            1 -1 -1 -1      0          0          0   1   1
2            1  1 -1 -1      1          0          0  -1   1
3            1 -1  1 -1      1          0          0  -1  -1
4            1  1  1 -1      0          0          0   1  -1
5            1 -1 -1  1      1          0          0   1  -1
6            1  1 -1  1      0          0          0  -1  -1
7            1 -1  1  1      0          0          0  -1   1
8            1  1  1  1      1          0          0   1   1
9            1 -1 -1 -1      1          0          0   1   1
10           1  1 -1 -1      0          0          0  -1   1
11           1 -1  1 -1      0          0          0  -1  -1
12           1  1  1 -1      1          0          0   1  -1
13           1 -1 -1  1      0          0          0   1  -1
14           1  1 -1  1      1          0          0  -1  -1
15           1 -1  1  1      1          0          0  -1   1
16           1  1  1  1      0          0          0   1   1
17           1 -1 -1 -1      0          1          0   1   1
18           1  1 -1 -1      1          1          0  -1   1
19           1 -1  1 -1      1          1          0  -1  -1
20           1  1  1 -1      0          1          0   1  -1
21           1 -1 -1  1      1          1          0   1  -1
22           1  1 -1  1      0          1          0  -1  -1
23           1 -1  1  1      0          1          0  -1   1
24           1  1  1  1      1          1          0   1   1
25           1 -1 -1 -1      0          1          0   1   1
26           1  1 -1 -1      1          1          0  -1   1
27           1 -1  1 -1      1          1          0  -1  -1
28           1  1  1 -1      0          1          0   1  -1
29           1 -1 -1  1      1          1          0   1  -1
30           1  1 -1  1      0          1          0  -1  -1
31           1 -1  1  1      0          1          0  -1   1
32           1  1  1  1      1          1          0   1   1
33           1 -1 -1 -1      0          0          1   1   1
34           1  1 -1 -1      1          0          1  -1   1
35           1 -1  1 -1      0          0          1  -1  -1
36           1  1  1 -1      1          0          1   1  -1
37           1 -1 -1  1      1          0          1   1  -1
38           1  1 -1  1      0          0          1  -1  -1
39           1 -1  1  1      1          0          1  -1   1
40           1  1  1  1      0          0          1   1   1
41           1 -1 -1 -1      1          0          1   1   1
42           1  1 -1 -1      0          0          1  -1   1
43           1 -1  1 -1      1          0          1  -1  -1
44           1  1  1 -1      0          0          1   1  -1
45           1 -1 -1  1      0          0          1   1  -1
46           1  1 -1  1      1          0          1  -1  -1
47           1 -1  1  1      0          0          1  -1   1
48           1  1  1  1      1          0          1   1   1
attr(,"assign")
[1] 0 1 2 3 4 5 5 6 7
attr(,"contrasts")
attr(,"contrasts")$block
[1] "contr.treatment"

attr(,"contrasts")$replicate
[1] "contr.treatment"

For reference these are the fitted values:

> fitted(linear_mod_3)
       1        2        3        4        5        6        7        8        9       10       11       12       13       14       15 
464.3630 324.7057 445.6205 432.6451 680.7623 552.2234 500.4922 476.3983 458.8038 330.2649 451.1797 427.0858 686.3215 546.6642 494.9330 
      16       17       18       19       20       21       22       23       24       25       26       27       28       29       30 
481.9575 447.6950 308.0377 428.9525 415.9770 664.0943 535.5554 483.8242 459.7303 447.6950 308.0377 428.9525 415.9770 664.0943 535.5554 
      31       32       33       34       35       36       37       38       39       40       41       42       43       44       45 
483.8242 459.7303 418.3950 278.7377 405.2117 381.1178 634.7942 506.2554 448.9649 435.9895 412.8358 284.2969 399.6525 386.6770 640.3535 
      46       47       48 
500.6961 454.5242 430.4302 

taking the first fitted value as an example I would have assumed that the calculation would have proceeded as follows:

$$Y = 487.432 - 38.158(0) - 20.862(0) 67.818(0) - 5.559(0) - 16.668(0) - 45.968(0) 28.891(1) - 43.162(1)$$

In anticipation that I should be getting $Y = 464$. When I do this manually it is not happening. What am I interpreting wrong in my approach? Thanks to anyone helping.

CodePudding user response:

It does work our correctly:

## First row of X
x <- c(1, 
-1, 
-1, 
-1,      
0 ,         
0 ,         
0 ,  
1 ,  
1)

## coefficients
b <- c(487.431,
-38.158,
-20.862,
67.818,
-5.559,
-16.668,
-45.968,
28.891,
-43.162)

## prediction
sum(x*b)
#> [1] 464.362

Created on 2022-04-07 by the reprex package (v2.0.1)

Note that A, B and C as well as A:B and A:C are coded as -1 and 1, not as 0 and 1 as in your proposed equation.

CodePudding user response:

I figured things out, but my gosh did I make them confusing. Since my model matrix switched over to $(-1,1)$ framework the calculations I did above are actually the following:

$$Y = 487.432 - 38.158(-1) - 20.862(-1) 67.818(-1) - 5.559(0) - 16.668(0) - 45.968(0) 28.891(1) - 43.162(1)$$.

It turns out due to me stacking model matrices and a few other things I mixed up a bunch of different factor frameworks. Not advisable for people in the future.

  • Related