I have a dataset with two columns called car data. I want to subtract the values in the both columns and then return the different values for average_distance using bootstrap. but my current code only return just one value multiple times.
I want Average_distance to be: 0.05, 0.7, 0.6, 0.9. 0.10 etc with the different values but i am getting Average_distance 0.99, 0.99,0.99, 0.99, 0.99.0.99 etc
average_distance <- c()
Bootss <- 10
total <- 5000
for (i in seq(Bootss)){
car_diff <- car_data[,1] - car_data[,2]
cars <- subset(car_diff, car_diff > 0)
for (i in 1:length(seq(Bootss))){
average_distance[i] <- length(cars)/length(total)
}
}
CodePudding user response:
I interpret your original post as you wanting to bootstrap the mean of the difference between two variables. Without comment on whether or not this is approriate from a statistics standpoint, here is the code to do so:
library(boot)
# create some dummy data
car_data <- data.frame(col1 = rnorm(1e2, 3, 2),
col2 = rnorm(1e2, 7, 1))
car_diff <- car_data[ , 1 ] - car_data[ , 2 ]
# define a function to calculate the statistic of interest
mean_func <- function(x, idx) {
mean(x[ idx ])
}
# perform bootstrap
boot_obj <- boot(data = car_diff, statistic = mean_func, R = 999)
print(boot_obj)
# obtain confidence intervals
boot.ci(boot_obj) |> print()
CodePudding user response:
The problem I see is that every time you are running the loop, you are calculating the average distance for the same samples. That is why you are getting the same result. You need to subset the data first, then perform your calc. Here is an example, where I subset a random 50% of the data and then perform the same calculation you show in your code. Notice that the outputs are all different because I subset first. When I take 100% every time, the results are all the same.
library(tidyverse)
Bootss <- 10
total <- 5
#sample 50%
vals <- replicate(n = Bootss, expr = {
carz <- mtcars |>
slice_sample(prop = 0.5) |>
mutate(diff = gear - carb) |>
filter(diff > 0)
return(nrow(carz)/total)
})
vals
#> [1] 2.4 1.2 2.2 2.0 1.8 1.8 1.8 2.4 1.2 2.0
#sample 100%
vals2 <- replicate(n = Bootss, expr = {
carz <- mtcars |>
slice_sample(prop = 1) |>
mutate(diff = gear - carb) |>
filter(diff > 0)
return(nrow(carz)/total)
})
vals2
#> [1] 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6