I am trying to run a more complicated simulation and save the results of reach replication to a data frame. A very simple abstraction of what I am trying to do is like:
sim.res <- data.frame(mn1 = mn1, mn2 = mn2, rep = rep)
for (k in 1:10){
set.seed(k*8)
mn1 <- mean(rnorm(25, 1, 1))
mn2 <- mean(rnorm(25, 2, 1))
sim.res$mn1 <- mn1
sim.res$mn2 <- mn2
sim.res$rep <- k
}
The output should be a dataframe with three columns and, in this case 10 rows, with the results of each replication stored in each row. What I get is the 10th replication. Where am I going wrong?
Thanks.
CodePudding user response:
You're overwriting the entirity of each column each time you iterate through the loop because R is vectorised (runs on columns) by default. Also, theres no need to reset your seed each time you iterate through the loop. Try something like this.
library(tidyverse)
set.seed(123)
lapply(
1:10,
function(k) {
data.frame(
mn1 = mean(rnorm(25, 1, 1)),
mn2 = mean(rnorm(25, 2, 1)),
rep = k
)
}
) %>%
bind_rows()
mn1 mn2 rep
1 0.9666697 2.102137 1
2 1.0102410 2.282576 2
3 0.7231910 1.769008 3
4 1.2152427 1.862371 4
5 1.1074981 1.875461 5
6 1.1727220 2.326180 6
7 0.9789770 2.025131 7
8 1.0981718 1.752828 8
9 0.9402713 1.928519 9
10 1.2714378 2.283175 10
Or, taking advantage of the fact that the mean of n draws from N(mu, sd) is N(mu, sd/sqrt(n)),
data.frame(
rep=rep(1:10),
mn1=rnorm(10, 1, sqrt(1/25)),
mn2=rnorm(10, 2, sqrt(1/25))
)
rep mn1 mn2
1 1 1.2053570 2.176493
2 2 1.1502123 2.041120
3 3 0.6981667 1.876713
4 4 0.9809705 1.853040
5 5 0.8208104 1.973639
6 6 0.5858498 2.062003
7 7 1.0300240 1.792064
8 8 0.9841577 1.963138
9 9 0.9805261 2.193453
10 10 1.0432305 1.978344
CodePudding user response:
In R, the most basic object is still a vector, so this doesn't require a loop at all.
When you are setting sim.res$mn1 <- mn1
you are putting mn1 into the data.frame, but since mn1 is only length 1 and sim.res has 10 rows, R repeats the value 10 times and puts it in your data.frame.
Also set.seed only needs to be used once for reproducibility.
set.seed(42)
sim.res <- data.frame(mn1 = replicate(10, mean(rnorm(25, 1, 1))),
mn2 = replicate(10, mean(rnorm(25, 2, 1))),
rep = 1:10)
sim.res
mn1 mn2 rep
1 1.0818928 1.934759 1
2 1.0153544 2.185476 2
3 0.9898056 1.523066 3
4 0.9132660 2.229027 4
5 1.1003520 1.754576 5
6 0.7468452 2.211500 6
7 0.8480360 2.303661 7
8 1.2965247 2.014611 8
9 0.7511059 1.801408 9
10 1.0231177 1.924155 10
replicate
is just a wrapper for sapply that makes this code easier to follow.
Maybe technically that's using a loop, but it sure doesn't look like one.