I am working with the R programming language.
I would like to generate random numbers : a1, a2, a3, b1, b2, b3
I would like there to be a condition such that:
- a3 > a2 > a1
- b3 > b2 > b1
I do not know how to do this directly, so I tried to generate a large data frame of numbers and only keep rows that match this condition:
a1 = rnorm(100000,10,10)
a2 = rnorm(100000,10,10)
a3 = rnorm(100000,10,10)
b1 = rnorm(100000,10,10)
b2 = rnorm(100000,10,10)
b3 = rnorm(100000,10,10)
my_data = data.frame(a1, a2, a3, b1, b2,b3)
This data frame looks like this:
head(my_data)
a1 a2 a3 b1 b2 b3
1 5.6713342 -4.5930442 6.063861 28.9258586 -1.073999 23.7398862
2 17.5791993 5.1482061 6.683438 9.2969640 6.438304 10.2569026
3 13.9389949 8.9943351 1.089840 12.9340164 22.099974 -0.6791567
4 16.0257008 10.4139726 18.469092 10.9470812 20.105047 0.4710750
5 -0.1370202 0.9112077 4.349729 11.9442915 22.318155 8.7671923
6 18.8508432 -3.6210024 3.022941 0.6319464 14.406452 25.2002712
I then tried to make an "indicator" variable that indicates whether a row should be deleted or kept based on whether or not it matches the conditions:
my_data$indicator_a2_a1 = ifelse(my_data$a2 > my_data$a1, "TRUE", "FALSE")
my_data$indicator_a3_a2 = ifelse(my_data$a3 > my_data$a2, "TRUE", "FALSE")
my_data$indicator_a3_a1 = ifelse(my_data$a3 > my_data$a1, "TRUE", "FALSE")
my_data$indicator_b2_b1 = ifelse(my_data$b2 > my_data$b1, "TRUE", "FALSE")
my_data$indicator_b3_b2 = ifelse(my_data$b3 > my_data$b2, "TRUE", "FALSE")
my_data$indicator_b3_b1 = ifelse(my_data$b3 > my_data$b1, "TRUE", "FALSE")
With these indicators, the data now looks like this:
a1 a2 a3 b1 b2 b3 indicator_a2_a1 indicator_a3_a2 indicator_a3_a1 indicator_b2_b1 indicator_b3_b2 indicator_b3_b1
1 5.6713342 -4.5930442 6.063861 28.9258586 -1.073999 23.7398862 FALSE TRUE TRUE FALSE TRUE FALSE
2 17.5791993 5.1482061 6.683438 9.2969640 6.438304 10.2569026 FALSE TRUE FALSE FALSE TRUE TRUE
3 13.9389949 8.9943351 1.089840 12.9340164 22.099974 -0.6791567 FALSE FALSE FALSE TRUE FALSE FALSE
4 16.0257008 10.4139726 18.469092 10.9470812 20.105047 0.4710750 FALSE TRUE TRUE TRUE FALSE FALSE
5 -0.1370202 0.9112077 4.349729 11.9442915 22.318155 8.7671923 TRUE TRUE TRUE TRUE FALSE FALSE
6 18.8508432 -3.6210024 3.022941 0.6319464 14.406452 25.2002712 FALSE TRUE FALSE TRUE TRUE TRUE
Finally, I isolated rows in which all indicators were TRUE:
final_file <- my_data[which(my_data$indicator_a2_a1 == "TRUE" & my_data$indicator_a3_a2 == "TRUE" & my_data$indicator_a3_a1 == "TRUE" & my_data$indicator_b2_b1 == "TRUE" & my_data$indicator_b3_b2 == "TRUE" & my_data$indicator_b3_b1 == "TRUE"), ]
dim(final_file)
[1] 2754 12
This was successfully accomplished the task - but I was wondering if there is a more "efficient" way to perform this task. For example, I randomly generated 100000 rows, but only 2754 of these rows (~ 2%) met the condition I had wanted. The other problem is that I had to manually create 6 indicator variables to make sure all conditions were respected - had there been more conditions, I would have been required to manually create a large number of indicator variables to ensure that all the conditions were respected.
My Question: Is there a way to randomly generate data according to some conditions such that ALL rows produced would meet these conditions? Could this be done with a WHILE LOOP?
CodePudding user response:
Does simply generating a list of random numbers for a and b and then sorting it using the sort() function work for your use case? The following code matches your specified conditions
a = rnorm(3,10,10)
b = rnorm(3,10,10)
a.ordered = sort(a)
b.ordered = sort(b)
df = data.frame(numbers = c(a.ordered,b.ordered),
row.names = c("a1","a2","a3","b1","b2","b3"))
df
CodePudding user response:
A "direct" method could be creating your variables sequentially using tibble
:
fun <- function(n) {
tibble(a3 = rnorm(n),
a2 = a3 - abs(rnorm(n)),
a1 = a2 - abs(rnorm(n)),
b3 = rnorm(n),
b2 = b3 - abs(rnorm(n)),
b1 = b2 - abs(rnorm(n)))
}
fun(10)
a3 a2 a1 b3 b2 b1
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.211 -0.901 -2.09 -0.988 -1.61 -2.40
2 -0.543 -2.04 -2.18 -0.0840 -1.06 -2.41
3 -0.190 -1.22 -1.41 -0.00393 -1.46 -1.73
4 2.11 1.36 1.20 -1.06 -2.21 -3.39
5 0.653 -0.156 -0.313 1.41 0.301 -0.539
6 -1.16 -1.46 -2.71 0.387 -1.40 -4.00
7 1.56 0.865 0.676 1.18 0.863 -0.296
8 1.01 0.544 0.0511 0.318 0.0864 -1.76
9 0.636 0.165 -1.83 0.929 0.905 0.210
10 0.633 -0.269 -1.01 0.466 -0.0685 -0.445