Home > database >  OLS estimator in R
OLS estimator in R

Time:10-31

I'm trying to compute OLS estimators manually in R for given vectors and matrices, but when I get the formula beta=(x'x)^-1(x'y), R tells me that the is a dimension issue, and I can't figure out why.

My code is

nr = 100
nc = 1000 
x=matrix(rnorm(nr * nc, mean=1, sd=1), nrow = nr)
epsilon=matrix(rnorm(nr * nc, mean=0, sd=1), nrow = nr)
k=c(1,2,4,8)

eta1=((epsilon^1-mean(epsilon^1))/(mean(epsilon^(1*2))-mean(epsilon^1)^2)^(1/2))
eta2=((epsilon^2-mean(epsilon^2))/(mean(epsilon^(2*2))-mean(epsilon^2)^2)^(1/2))
eta4=((epsilon^4-mean(epsilon^4))/(mean(epsilon^(4*2))-mean(epsilon^4)^2)^(1/2))
eta8=((epsilon^8-mean(epsilon^8))/(mean(epsilon^(8*2))-mean(epsilon^8)^2)^(1/2))


y1=x eta1
y2=x eta2
y4=x eta4
y8=x eta8

beta1=inv(t(x)*x)*(t(x)*y1)
beta2=inv(t(x)*x)*(t(x)*y2)
beta4=inv(t(x)*x)*(t(x)*y4)
beta8=inv(t(x)*x)*(t(x)*y8)


Also, I feel that there should be a way to loop through the values of k to get this automated, instead of doing each eta by hand. So, a bit of help in this area would also be appreciated.

The ouput I'm looking for is to get a vector of beta for each of the different values of k.

CodePudding user response:

You have got several issues. Firstly, you should have nx1 matrix for y and epsilon, but you have nxm matrix for them instead. Secondly, you should use matrix multiplication which is %*% in R. i.e. t(x)%*%y1. However you use dot product (*) instead.

For the sake of simplicity, lets create a matrix with 5 columns. My approach is creating a dependent variable which is related with x columns (independent variables or feature matrix in machine learning terminology)

nr = 100
nc = 5
x=matrix(rnorm(nr * nc, mean=1, sd=1), nrow = nr)
epsilon=matrix(rnorm(nr, mean=0, sd=1), nrow = nr) # it should be nx1
k=c(1,2,4,8)

eta1=((epsilon^1-mean(epsilon^1))/(mean(epsilon^(1*2))-mean(epsilon^1)^2)^(1/2))
eta2=((epsilon^2-mean(epsilon^2))/(mean(epsilon^(2*2))-mean(epsilon^2)^2)^(1/2))
eta4=((epsilon^4-mean(epsilon^4))/(mean(epsilon^(4*2))-mean(epsilon^4)^2)^(1/2))
eta8=((epsilon^8-mean(epsilon^8))/(mean(epsilon^(8*2))-mean(epsilon^8)^2)^(1/2))

To check the output we should create y values wisely. So, let's define some betas and create y values wrt them. At the end, we can compare the output with the inputs we defined. Note that, you should have 5 betas for 5 columns.

# made up betas
beta1_real <- 1:5
beta2_real <- -4:0
beta4_real <- 7:11
beta8_real <- seq(0.25,1.25,0.25)

To create the y values,

y1=  10   x %*% matrix(beta1_real)   eta1
y2=  20   x %*% matrix(beta2_real)   eta2
y4=  30   x %*% matrix(beta4_real)   eta4
y8=  40   x %*% matrix(beta8_real)   eta8

Here, I also added a constant term for each y values. To get the constant term at the end, we should add ones at the beginning of our x matrix like,

x <- cbind(matrix(1,nrow = nr),x)

The rest is almost same with yours. Only difference is I used solve instead of inv and also I used the matrix multiplication (%*%)

beta1=solve(t(x)%*%x)%*%(t(x)%*%y1)
beta2=solve(t(x)%*%x)%*%(t(x)%*%y2)
beta4=solve(t(x)%*%x)%*%(t(x)%*%y4)
beta8=solve(t(x)%*%x)%*%(t(x)%*%y8)

If we compare the outputs,

beta1_real was,

# [1] 1 2 3 4 5

and the output of beta1 is,

#           [,1]
# [1,] 10.0049631
# [2,]  0.9632124
# [3,]  1.8987402
# [4,]  2.9816673
# [5,]  4.2111817
# [6,]  4.9529084

The results are similar. 10 at the beginning is the constant term I added. The difference stems from the error term applied (etas).

  • Related