I created a simple linear model in R using factors as my predictors, but I'm having issues trying to reconcile how the fitted values are being calculated.
This is the linear model:
> summary(linear_mod_3)
Call:
lm.default(formula = sqrt(sales_price) ~ A B C A:B B:C
block replicate, data = fac_combo_full)
Residuals:
Min 1Q Median 3Q Max
-91.655 -23.715 6.383 24.044 81.514
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 487.431 11.760 41.449 < 2e-16 ***
A -38.158 5.880 -6.490 1.08e-07 ***
B -20.862 5.880 -3.548 0.00103 **
C 67.818 5.880 11.534 3.87e-14 ***
block1 -5.559 11.760 -0.473 0.63904
replicate2 -16.668 14.403 -1.157 0.25420
replicate3 -45.968 14.403 -3.192 0.00279 **
A:B 28.891 5.880 4.913 1.65e-05 ***
B:C -43.162 5.880 -7.340 7.34e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 40.74 on 39 degrees of freedom
Multiple R-squared: 0.8764, Adjusted R-squared: 0.851
F-statistic: 34.55 on 8 and 39 DF, p-value: 2.252e-15
And this is the model matrix being applied to it:
> model.matrix(linear_mod_3)
(Intercept) A B C block1 replicate2 replicate3 A:B B:C
1 1 -1 -1 -1 0 0 0 1 1
2 1 1 -1 -1 1 0 0 -1 1
3 1 -1 1 -1 1 0 0 -1 -1
4 1 1 1 -1 0 0 0 1 -1
5 1 -1 -1 1 1 0 0 1 -1
6 1 1 -1 1 0 0 0 -1 -1
7 1 -1 1 1 0 0 0 -1 1
8 1 1 1 1 1 0 0 1 1
9 1 -1 -1 -1 1 0 0 1 1
10 1 1 -1 -1 0 0 0 -1 1
11 1 -1 1 -1 0 0 0 -1 -1
12 1 1 1 -1 1 0 0 1 -1
13 1 -1 -1 1 0 0 0 1 -1
14 1 1 -1 1 1 0 0 -1 -1
15 1 -1 1 1 1 0 0 -1 1
16 1 1 1 1 0 0 0 1 1
17 1 -1 -1 -1 0 1 0 1 1
18 1 1 -1 -1 1 1 0 -1 1
19 1 -1 1 -1 1 1 0 -1 -1
20 1 1 1 -1 0 1 0 1 -1
21 1 -1 -1 1 1 1 0 1 -1
22 1 1 -1 1 0 1 0 -1 -1
23 1 -1 1 1 0 1 0 -1 1
24 1 1 1 1 1 1 0 1 1
25 1 -1 -1 -1 0 1 0 1 1
26 1 1 -1 -1 1 1 0 -1 1
27 1 -1 1 -1 1 1 0 -1 -1
28 1 1 1 -1 0 1 0 1 -1
29 1 -1 -1 1 1 1 0 1 -1
30 1 1 -1 1 0 1 0 -1 -1
31 1 -1 1 1 0 1 0 -1 1
32 1 1 1 1 1 1 0 1 1
33 1 -1 -1 -1 0 0 1 1 1
34 1 1 -1 -1 1 0 1 -1 1
35 1 -1 1 -1 0 0 1 -1 -1
36 1 1 1 -1 1 0 1 1 -1
37 1 -1 -1 1 1 0 1 1 -1
38 1 1 -1 1 0 0 1 -1 -1
39 1 -1 1 1 1 0 1 -1 1
40 1 1 1 1 0 0 1 1 1
41 1 -1 -1 -1 1 0 1 1 1
42 1 1 -1 -1 0 0 1 -1 1
43 1 -1 1 -1 1 0 1 -1 -1
44 1 1 1 -1 0 0 1 1 -1
45 1 -1 -1 1 0 0 1 1 -1
46 1 1 -1 1 1 0 1 -1 -1
47 1 -1 1 1 0 0 1 -1 1
48 1 1 1 1 1 0 1 1 1
attr(,"assign")
[1] 0 1 2 3 4 5 5 6 7
attr(,"contrasts")
attr(,"contrasts")$block
[1] "contr.treatment"
attr(,"contrasts")$replicate
[1] "contr.treatment"
For reference these are the fitted values:
> fitted(linear_mod_3)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
464.3630 324.7057 445.6205 432.6451 680.7623 552.2234 500.4922 476.3983 458.8038 330.2649 451.1797 427.0858 686.3215 546.6642 494.9330
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
481.9575 447.6950 308.0377 428.9525 415.9770 664.0943 535.5554 483.8242 459.7303 447.6950 308.0377 428.9525 415.9770 664.0943 535.5554
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
483.8242 459.7303 418.3950 278.7377 405.2117 381.1178 634.7942 506.2554 448.9649 435.9895 412.8358 284.2969 399.6525 386.6770 640.3535
46 47 48
500.6961 454.5242 430.4302
taking the first fitted value as an example I would have assumed that the calculation would have proceeded as follows:
$$Y = 487.432 - 38.158(0) - 20.862(0) 67.818(0) - 5.559(0) - 16.668(0) - 45.968(0) 28.891(1) - 43.162(1)$$
In anticipation that I should be getting $Y = 464$. When I do this manually it is not happening. What am I interpreting wrong in my approach? Thanks to anyone helping.
CodePudding user response:
It does work our correctly:
## First row of X
x <- c(1,
-1,
-1,
-1,
0 ,
0 ,
0 ,
1 ,
1)
## coefficients
b <- c(487.431,
-38.158,
-20.862,
67.818,
-5.559,
-16.668,
-45.968,
28.891,
-43.162)
## prediction
sum(x*b)
#> [1] 464.362
Created on 2022-04-07 by the reprex package (v2.0.1)
Note that A
, B
and C
as well as A:B
and A:C
are coded as -1 and 1, not as 0 and 1 as in your proposed equation.
CodePudding user response:
I figured things out, but my gosh did I make them confusing. Since my model matrix switched over to $(-1,1)$ framework the calculations I did above are actually the following:
$$Y = 487.432 - 38.158(-1) - 20.862(-1) 67.818(-1) - 5.559(0) - 16.668(0) - 45.968(0) 28.891(1) - 43.162(1)$$.
It turns out due to me stacking model matrices and a few other things I mixed up a bunch of different factor frameworks. Not advisable for people in the future.