I'm a bit stumped, I'm attempting to write a code that runs Monte Carlo simulations of increasing sample sizes until certain conditions are met. First off, the bit of code that I know does work:
##Step 0 - load packages##
library(tidyverse)
library(ggplot2)
library(ggthemes)
##Step 1 - Define number of cycles per simulation##
ncycles <- 250000
##Step 2 - Define function for generating volumes and checking proportion of failed cycles##
volSim <- function(ncycles){
tols <- rnorm(ncycles,0,0.3) #Generate n unique tolerances
vols <- 0 #Establish vols variable within function
for (tol in 2:ncycles){ #for loop creates n unique volumes from tolerances
vols[tol] <- 2.2 tols[tol]-tols[tol-1]
}
cell <- rnorm(1,3.398864,0.4810948) #Generate a unique threshold
return(c(mean(vols>cell),mean(vols>cell*2),mean(vols>cell*20))) #Output a vector of failure rate
}
This works fine and outputs three values equivalent to the proportion of events over multiples of the threshold. Now, for the bit that's not behaving;
##Step 3 - Define a function to run multiple iterations of simulation and check convergence ##
regres <- function(ncycles){
#Establish parameters used within function#
converged <- FALSE
fail_rate_5k <- 0
se_5k <- 0
ncells <- 0
fail_rate_10k <- 0
se_10k <- 0
fail_rate_100k <- 0
se_100k <- 0
n <- 0
while ((converged == FALSE & n<6) | n<4){
n <- n 1
res <- replicate(2^(n 5),volSim(ncycles))
fail_rate_5k[n] <- mean(res[1,]>0)
se_5k[n] <- sqrt(fail_rate_5k[n]*(1-fail_rate_5k[n])/2^(n 5))
ncells[n] <- 2^(n 5)
fail_rate_10k[n] <- mean(res[2,]>0)
se_10k[n] <- sqrt(fail_rate_10k[n]*(1-fail_rate_10k[n])/2^(n 5))
fail_rate_100k[n] <- mean(res[3,]>0)
se_100k[n] <- sqrt(fail_rate_100k[n]*(1-fail_rate_100k[n])/2^(n 5))
if((fail_rate_5k[n] <= 0 | se_5k[n] < 0.5*fail_rate_5k[n]) &
(fail_rate_10k[n] <= 0 | se_10k[n] < 0.5*fail_rate_10k[n]) &
(fail_rate_100k[n] <= 0 | se_100k[n] < 0.5*fail_rate_100k[n])){
converged <- TRUE}
else {converged <- FALSE}
return(data.frame(k5 = fail_rate_5k, se_k5 = se_5k, ncells_k5 = ncells, k10 = fail_rate_10k, se_k10 = se_10k, ncells_k10 = ncells, k100 = fail_rate_100k, se_k100 = se_100k, ncells_k100 = ncells))}
}
The intention is that the simulation will repeat at increasing sample sizes until the standard error for all fail rates (5k, 10k, 100k) is less than half of the fail rate, or the fail rate itself is zero (to avoid a dividing by zero scenario). Two caveats are that the simulation must run at least four times (the n<4 condition in the while loop), and stop after a maximum of six.
Now, if I run the code within the regres function in isolation (with ncycles set to 250000), I generate a nice data frame with 5 rows, I can see that n = 5, converged = TRUE, and everything else that I expect to be happening within the function just fine. If I run result <- regres(ncycles)
however, it outputs a single row data frame every time. The while loop is stopping at n=1 despite the n<4 condition. I cannot for the life of me figure out why the behaviour is different when the function is called from when the code inside it is run in isolation.
While I'm really looking to find out why this method is not working, if the method itself is completely outlandish I'm open to using a different approach entirely too.
CodePudding user response:
Your return statement is in the while
loop. It will return the data.frame at the end of the first iteration (essentially a break
before it even checks the condition)
Try:
...
converged <- TRUE}
else {converged <- FALSE}
}
return(data.frame(k5 = fail_rate_5k, se_k5 = se_5k, ncells_k5 = ncells, k10 = fail_rate_10k, se_k10 = se_10k, ncells_k10 = ncells, k100 = fail_rate_100k, se_k100 = se_100k, ncells_k100 = ncells))
}