Home > Net >  How to choose length of "i" vector for "boot" function?
How to choose length of "i" vector for "boot" function?

Time:10-10

I have a dataframe that has 20,000 observations and I want to bootstrap 700 of those observations, calculate the mean and repeat for 1,000 runs. I know how to code this myself but I was trying to use the "boot" library because of the great plotting and CI options.

df <- seq(1, 20000, 1)
meanfun <- function(data, ind) {
  return(mean(data[ind]))   
}

library(boot)
results <- boot(df, statistic=meanfun, R=10000)

I have read the documentation and I haven't seen how to CHOOSE the length of "ind".
If I was going to do the hard way, I would use this code:

df <- seq(1, 20000, 1) # dataframe of 10000 observations
meanfun <- function(data) {
  return(mean(data))
} # function to calculate mean

S <- numeric(1000) # Vector to store 1000 values from random sampling
for (i in 1:1000) {
  one_sample <- sample(df, 700) # sample 700 random observations
  print(one_sample)
  one_result <- meanfun(one_sample) # find mean of that sampling
  S[i] <- one_result # Store that value
}
S
meanfun(S) # average value of 1000 values

But how do I choose to only randomly sample 700 observations 1000 times using the boot function?

Thanks in advance!

CodePudding user response:

What you are doing is subsampling without replacement rather than bootstrapping. I am not aware that this is possible with boot, since ind is resampled with replacement and I don't see any way to subset it. However I don't use boot and might be wrong.

Actually you can do it less cumbersome; just define your subsampling FUNction and replicate it.

FUN <- function() mean(sample(x, 700))

R <- 2e4
set.seed(49076)
S <- replicate(R, FUN())

That's it. You can easily calculate the mean,

mean(S)
# [1] 10000.16

and percentile confidence intervals.

## 95% CI
quantile(S, probs=c(.025, .975))
#      2.5%    97.5% 
# 9584.945 10412.164 

Also mimicking the plotting functionalities of boot is straightforward.

op <- par(mfrow=c(1, 2))

hist(S, breaks='FD', freq=FALSE)
qqnorm(S); qqline(S)

par(op)

enter image description here

  • Related