I am working with the R programming language. I created the following loop that generates 1000 random numbers - and then repeats this process 10 times:
results <- list()
for (i in 1:10){
a = rnorm(1000,10,1)
b = rnorm(1000,10,1)
d_i = data.frame(a,b)
d_i$index = 1:nrow(d_i)
d_i$iteration = as.factor(i)
results[[i]] <- d_i
}
results_df <- do.call(rbind.data.frame, results)
Question: I would like to change this loop such that instead of only generating 1000 random numbers, it keeps generating random numbers until a certain condition is met, for example: KEEP generating random numbers UNTIL d_i$a > 10 AND d_i$b > 10.
Using a "WHILE()" statement, I tried to do this:
results <- list()
for (i in 1:10){
while (d_i$a > 10 & d_i$b >10) {
a = rnorm(1000,10,1)
b = rnorm(1000,10,1)
d_i = data.frame(a,b)
d_i$index = 1:nrow(d_i)
d_i$iteration = as.factor(i)
results[[i]] <- d_i
}
}
results_df <- do.call(rbind.data.frame, results)
Problem: However, this returns the following warnings (10 times):
Warning messages:
1: In while (d_i$a > 10 & d_i$b > 10) { :
the condition has length > 1 and only the first element will be used
And produces an empty table:
> results_df
data frame with 0 columns and 0 rows
Can someone please help me fix this problem?
Thanks!
CodePudding user response:
The error messages in the original post are due to the fact that d_i$a
and d_i$b
are vectors with 1,000 elements and 10 is a scalar. Therefore, R compares the first element in d_i$a
and the first element in d_i$b
with 10.
To resolve the error message we need to compare a vector with length 1 to the scalar 10. This requires restructuring the code to generate the random numbers one at a time. From the description in the original post, it is unclear whether this behavior was intentional.
I'll simplify the problem by eliminating the set of 10 replications to illustrate how to create a data frame with random numbers until a row has both a
and b
with values greater than 10.
First, we set a seed to make the answer reproducible, and then initialize some objects. By setting a
and b
to 0 we ensure that the while()
loop will execute at least once.
set.seed(950141238) # for reproducibility
results <- list()
a <- 0 # initialize a to a number < 10
b <- 0 # initialize b to a number < 10
i <- 1 # set a counter
Having initialized a
and b
, the while()
loop evaluates to TRUE
generates two random numbers, assigns an index value, and writes them as a data frame to the results
list. The logic for the while()
loop indicates that if either a
is less than or equal to 10 or b
is less than or equal to 10, the loop keeps iterating. It stops when both a
and b
are greater than 10.
while(a <= 10 | b <= 10){
a <- rnorm(1,10,1) # generate 1 random number with mean of 10 and sd of 1
b <- rnorm(1,10,1) # ditto
results[[i]] <- data.frame(index = i,a,b)
i <- i 1 # increment i
}
The loop stops executing after the third iteration as we can see by printing the resulting data frame after we combine the individual rows with do.call()
and rbind()
.
df <- do.call(rbind,results)
df
...and the output:
> df
index a b
1 1 8.682442 8.846653
2 2 9.204682 8.501692
3 3 8.886819 10.488972
4 4 11.264142 8.952981
5 5 9.900112 10.918042
6 6 9.185120 10.625667
7 7 9.620793 10.316724
8 8 11.718397 9.256835
9 9 10.034793 11.634023
>
Notice that the last row in the data frame has values greater than 10 for both a
and b
.
Multiple replications of the while loop
To repeat the process 10 times as is done in the original post, we wrap the operation in a for()
loop, and add a second list, combined_results
to save the results from each iteration.
set.seed(950141238) # for reproducibility
combined_results <- list()
for(iteration in 1:10){
results <- list()
a <- 0 # initialize a to a number < 10
b <- 0 # initialize b to a number < 10
i <- 1 # set a counter
while((a < 10) | (b < 10)){
a <- rnorm(1,10,1) # generate 1 random number with mean of 10 and sd of 1
b <- rnorm(1,10,1) # ditto
results[[i]] <- data.frame(iteration,index = i,a,b)
i <- i 1 # increment i
}
combined_results[[iteration]] <- do.call(rbind,results)
}
df <- do.call(rbind,combined_results)
df[df$iteration < 5,]
...and the output for the first 4 iterations of the outer loop:
> df[df$iteration < 5,]
iteration index a b
1 1 1 8.682442 8.846653
2 1 2 9.204682 8.501692
3 1 3 8.886819 10.488972
4 1 4 11.264142 8.952981
5 1 5 9.900112 10.918042
6 1 6 9.185120 10.625667
7 1 7 9.620793 10.316724
8 1 8 11.718397 9.256835
9 1 9 10.034793 11.634023
10 2 1 11.634331 9.746453
11 2 2 9.195410 7.665265
12 2 3 11.323344 8.279968
13 2 4 9.617224 11.792142
14 2 5 9.360307 11.166162
15 2 6 7.963320 11.325801
16 2 7 8.022093 8.568503
17 2 8 10.440788 9.026129
18 2 9 10.841408 10.033346
19 3 1 11.618665 10.179793
20 4 1 10.975061 9.503309
21 4 2 10.209288 12.409656
>
Again we note that the last row in each iteration (9, 18, 19, and 21) have values greater than 10 for both a
and b
.
Note that this approach fails to take advantage of vectorized operations in R, meaning that instead of generating 1,000 random numbers with each call to rnorm()
, the code based on a while()
generates a single random number per call to rnorm()
. Since rnorm()
is a resource intensive function, code that minimizes the number of times rnorm()
executes is desirable.
CodePudding user response:
I hope these comments help to follow how it works. It mainly makes use of repeat
which is just an infinite loop. It can be stopped using the break
keyword.
results <- list()
for (i in 1:10){
# do until break
repeat {
# repeat many random numbers
a = rnorm(1000,10,1)
b = rnorm(1000,10,1)
# does any pair meet the requirement
if (any(a > 10 & b > 10)) {
# put it in a data.frame
d_i = data.frame(a,b)
# end repeat
break
}
}
# select all rows until the first time the requirement is met
# it must be met, otherwise the loop would not have ended
d_i <- d_i[1:which(d_i$a > 10 & d_i$b > 10)[1], ]
# prep other variables
d_i$index = seq_len(nrow(d_i))
d_i$iteration = as.factor(i)
results[[i]] <- d_i
}
CodePudding user response:
To break out of a loop (while or for), simply at a break()
after an if
condition.
out <- vector("integer", 26)
for (i in seq_along(letters)) {
if(letters[i] == "t") break()
out[i] <- i 1
}
out
#> [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 0 0 0 0 0 0
Will break out of a loop. From ?break
: control is transferred to the first statement outside the inner-most loop.
However, from your question it is not entirely clear why you are trying this - such control flow might not be the appropriate solution, as a vectorized solution might exist. Further, beware of doing unneccessary things inside a loop - it is a common cause for slow running code. Here we can take some things out of the for-loop, such as d_i$iteration
and d_i$index
, and still end up with the same result. Have a look at the Third Circle.