I am trying to construct a model matrix using model.matrix
. Here's my data, stored as a data frame called wILI
:
date value week month year
1997-10-01 0.002734167 1 10 1997
1997-10-08 0.003612784 2 10 1997
1997-10-15 0.004757731 3 10 1997
1997-10-22 0.006238000 4 10 1997
1997-10-29 0.008132015 5 10 1997
1997-11-05 0.010522688 6 11 1997
1997-11-12 0.013487294 7 11 1997
1997-11-19 0.017080349 8 11 1997
1997-11-26 0.021308731 9 11 1997
1997-12-03 0.026101156 10 12 1997
1997-12-10 0.031279133 11 12 1997
1997-12-17 0.036542190 12 12 1997
1997-12-24 0.041482753 13 12 1997
1997-12-31 0.045640193 14 12 1997
1998-01-07 0.048587584 15 01 1998
1998-01-14 0.050025386 16 01 1998
1998-01-21 0.049847167 17 01 1998
1998-01-28 0.048152678 18 01 1998
1998-02-04 0.045207680 19 02 1998
1998-02-11 0.041371773 20 02 1998
1998-02-18 0.037022686 21 02 1998
1998-02-25 0.032498271 22 02 1998
1998-03-04 0.028064335 23 03 1998
1998-03-11 0.023905745 24 03 1998
1998-03-18 0.020133246 25 03 1998
1998-03-25 0.016798043 26 03 1998
1998-04-01 0.013908254 27 04 1998
1998-04-08 0.011443810 28 04 1998
1998-04-15 0.009368329 29 04 1998
1998-04-22 0.007637759 30 04 1998
1998-04-29 0.006206186 31 04 1998
1998-05-06 0.005029414 32 05 1998
1998-05-13 0.004066965 33 05 1998
1998-05-20 0.003282970 34 05 1998
1998-05-27 0.002646398 35 05 1998
I am testing two models for the wILI data, one with a month regressor and the other with a week regressor. That is, I want a coefficient for each month (model 1), and each week (model 2). For the above data, the possible months are 1,2,3,4,5,10,11,12 and the possible weeks are 1,2,...,35. When I use model.matrix(~ 0 month, wILI)
, it works as expected:
month01 month02 month03 month04 month05 month10 month11 month12
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0
The element in the ith row has a 1 in the column of its corresponding month, and zeros in all the other columns, just like I want. But when I try the same thing using "week" instead of "month", I get this:
week
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
...Huh?? Why am I getting a 35x1 vector? I want a 35x35 matrix where the first row has a 1 in the first column and zeros everywhere else, the second row has a 1 in the second column and zeros everywhere else, the third row has a 1 in the third column and zeros everywhere else, etc (i.e. the 35x35 identity matrix). Any suggestions for how to accomplish this? And why should the output be so different by simply changing "month" to "week"?
CodePudding user response:
Ensure that week and month are factor (or character). Numeric predictors become a single column in the model matrix whereas a factor generates a column for each level or all except one level if there is an intercept. If the column were already factor or character then factor(...) surrounding the variable could be omitted.
model.matrix(~ factor(month) 0, wILI)
model.matrix(~ factor(week) 0, wILI)
Another way to write this which gives nicer coefficient names is:
model.matrix(~ month 0, transform(wILI, month = factor(month)))
model.matrix(~ week 0, transform(wILI, week = factor(week)))