I have the following code that selects 4 rows of iris 1000x, and takes the mean of each 4 row sample:
library(dplyr)
iris<- iris
storage<- list()
counter<- 0
for (i in 1:1000) {
# sample 3 randomly selected transects 100 time
tempsample<- iris[sample(1:nrow(iris), 4, replace=F),]
storage[[i]]=tempsample
counter<- counter 1
print(counter)
}
# Unpack results into dataframe
results<- do.call(rbind, storage)
View(results)
results_2<- as.data.frame(results)
results_2<- results_2 %>% mutate(Aggregate = rep(seq(1,ceiling(nrow(results_2)/4)),each = 4))
# View(results_2)
final_results<- aggregate(results_2[,1:4], list(results_2$Aggregate), mean)
# View(final_results)
I want to calculate the bias of each column in relation to their true population parameter. For example using SimDesign
's bias()
:
library(SimDesign)
(bias(final_results[,2:5], parameter=c(5,3,2,1), type='relative'))*100
In this code, the values of parameter are hypothetical true pop. values of each column in the dataframe. I want to do this process 100x to get a distribution of bias estimates for each variable in the dataframe. However, I'm not sure how to fit all of this into a for loop (what I think would be the way to go) so the final output is a dataframe with 100 rows of bias measurements for each iris variable.
Any help with this would be greatly appreciated.
CodePudding user response:
Here is one way to do that. I've made some minor changes to your code, and wrapped it in a function. Then, use lapply
over a sequence say 1:10
or 1:100
, each time running your function, and feeding the result to your bias
function from the SimDesign
package. Then row bind the resulting list
library(dplyr)
get_samples <- function(df, size=4, n=1000) {
storage<- list()
counter<- 0
for (i in 1:1000) {
tempsample<- df[sample(1:nrow(df), size, replace=F),]
storage[[i]]=tempsample
counter<- counter 1
}
results<- do.call(rbind, storage)
results_2<- as.data.frame(results)
results_2<- results_2 %>% mutate(Aggregate = rep(seq(1,ceiling(nrow(results_2)/size)),each = size))
final_results<- aggregate(results_2[,1:size], list(results_2$Aggregate), mean)
return(final_results)
}
iris=iris
replicates = lapply(1:10, function(x) {
result = get_samples(iris)
(bias(result[,2:5], parameter=c(5,3,2,1), type='relative'))*100
})
replicates = do.call(rbind, replicates)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,] 41.50617 3.292500 86.77408 8.859333
[2,] 43.26058 2.763500 90.20758 10.825917
[3,] 43.46642 3.551750 90.11767 10.576250
[4,] 41.94683 2.970833 86.89625 8.817000
[5,] 42.08733 3.380917 86.78642 8.996667
[6,] 42.13050 2.942250 88.02983 9.707500
[7,] 43.07383 2.775500 89.04583 10.102083
[8,] 44.10192 2.895167 91.27208 11.188500
[9,] 41.29408 2.314750 87.59208 9.244333
[10,] 42.77450 2.781583 90.37342 10.789500