I'm working on an assignment. I've given things an honest attempt so I thought I'd reach out to stack for help.
I have to run boot procedure 10 times with a sample size of 10 and write a function to estimate the linear regression model of each set of input. This isn't the complete question, however, it's the part I'm stuck on. If you want the full question let me know.
Here is the code I've attempted with thus far. The x and y data are meant to be treated as pairs (x_i, y_i)
:
rm(list=ls())
x = c(1,1.5,2,3,4,4.5,5,5.5,6,6.5,7,8,9,10,11,12,13,14,15)
y = c(6.3,11.1,20,24,26.1,30,33.8,34.0,38.1,39.9,42,46.1,53.1,52,52.5,48,42.8,27.8,21.9)
n=length(y)
myfunc <- function(data,index){
# Calculate and return the estimate of linear regression model
lmout <- lm(data)
return(lmout$estimate)
}
# call boot
library(boot)
bout = NULL
# Calling boot 10 times...
for(i in 1:10){
#... with a bootstrap distribution of size 10
bout = boot(data = y ~ x, statistic = myfunc, R = 10)
}
print(bout$t)
My issues is that when I print(bout$t)
, it displays a column with no values:
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
Adding a print statement within myfunc
(print(lmout)
) returns the following output 100 times:
Coefficients:
(Intercept) x
21.321 1.771
I'm under the assumption something is either going wrong with how I'm generating my bootstrap input or if something is going wrong as I return it.
CodePudding user response:
This seems to be more like a reasonable answer:
library(boot)
data <- data.frame(
x = c(1,1.5,2,3,4,4.5,5,5.5,6,6.5,7,8,9,10,11,12,13,14,15),
y = c(6.3,11.1,20,24,26.1,30,33.8,34.0,38.1,39.9,42,46.1,53.1,52,52.5,48,42.8,27.8,21.9))
myfunc <- function(data, index){
# Calculate and return the estimate of linear regression model
lmout <- lm(y ~ x, data = data[index,])
coef(lmout)
}
myfunc(data.frame(x,y)) # always run this once to see if you function makes sense
boot(data = data.frame(x,y), statistic = myfunc,
R = 250)
The R =
indicates how many times the bootstrapping should occur. The index
argument is what determines the new "bootstrapped" sampling. The data has to be in a data.frame, otherwise it is not placed where the boot
-package can grab hold of it.