Selecting consecutive values from a bootstrap sample in r with repeated values-CodePudding

I'm not exactly sure how to go about this in R. I've got a data set with 40 values, some of which repeat and I want to perform a small bootstrap on this dataset to find the mean of two or more consecutive values. For example, I randomly select a value from the dataset provided below, say the very first value is selected which is 0.2, so x1=0.2. How can I make sure that in the same for loop R is able to select the next value, x2, to be 0.2 as that is the second value in the dataset? Thus it would appear as x1=0.2 and x2=0.2.

I can't really think of a way for this to be done as it would need to be repeated for each iteration and since the sample() function selects any random value that makes it harder to pinpoint exactly which value it selected given there are repeated values.

I've provided a sample code that calculates the mean for 1 observation and I would like to get it to work for 2 consecutive observations. So then I can calculate the means individually and display them.

If anyone has any way to handle this I would appreciate it. Thanks ahead of time.

x=c(0.20,0.20,0.21,0.21,0.21,0.20,0.19,0.18,0.16,0.10,
      0.02,-0.02,0.01,0.03,0.07,0.14,0.22,0.13,0.12,
      0.16,0.17,0.18,0.18,0.17,0.15,0.15,0.13,0.12,
      0.10,0.08,0.06,0.04,0.03,0.02,0.03,0.05,0.34,
      0.13,0.11,0.12)
B<- 500
result1<- numeric(B)
# result2<- numerib(B)
for (b in 1:B){
  x1<-sample(x=x,size =1, replace=TRUE)
#  x2<-
  result1[b]<-x1
#  result2[b]<-x2
}
mean1<- mean(result1)
# mean2<- mean(result2)

CodePudding user response：

A simple approach could be:

result <- matrix(nrow = B, ncol = 2)

for (b in 1:B){
  idx1 <- sample(seq_along(x), size = 1)
  idx2 <- idx1 %% length(x)   1
  result[b, 1] <- x[idx1]
  result[b, 2] <- x[idx2]
}

storing the results in a matrix:

> result
        [,1]  [,2]
  [1,]  0.21  0.21
  [2,]  0.12  0.20
  [3,]  0.21  0.21
  [4,]  0.10  0.02
  [5,]  0.10  0.02
  [6,]  0.21  0.20
  [7,]  0.02 -0.02
  [8,] -0.02  0.01
  [9,]  0.21  0.20
 [10,]  0.17  0.15

CodePudding user response：

Sample the indices of x, then use this to subset x for result1. Use the sampled index 1 to subset x for result2. However, you also need a wrap around so that if you sample the last member of x, you sample the first as well (as the "next" value)

B <- 500

result1<- numeric(B)
result2 <- numeric(B)

for(i in 1:B) {
  j <- sample(seq_along(x), 1)
  if(j == 40) k <- 1 
  else k <- j   1
  result1[i] <- x[j]
  result2[i] <- x[k]
}
mean(result1)
#> [1] 0.12618
mean(result2)
#> [1] 0.13034

Note also that since R is vectorized, you don't need a loop here at all. You could just do:

result1 <- sample(seq_along(x), 500, replace = TRUE)
result2 <- result1   1
result2[result2 == 41] <- 1

mean(x[result1])
#> [1] 0.12568

mean(x[result2])
#> [1] 0.12596

^{Created on 2022-03-28 by the}