For my problem, I have 4 lists, each containing 10,000 elements. Let the lists be a, b, c, d. I can calculate Probablity(a<b) by just performing mean operation mean(a<b)
. If I understand correctly, it compares each of the 10,000 elements in a and b in order and tells me for how many elements a<b holds (with the number being divided by total elements).
Now, I want to compute Probablity(a<b<c<d). I want r to compare 10,000 elements in order and tell me for how many elements does (a<b<c<d) hold. However, I'm unable to do it using the mean function since it doesn't accept more than one <
sign. How can I use the mean function here? I'm an absolute beginner in r, but logically, I feel this should be straightforward rather than looping over everything and having a count variable.
CodePudding user response:
How about this:
a <- runif(10000)
b <- runif(10000)
c <- runif(10000)
d <- runif(10000)
mean(a<b & b<c & c<d)
#> [1] 0.0425
Created on 2022-12-06 by the reprex package (v2.0.1)
CodePudding user response:
You need to use logical operators, which are &
(and) and |
(or). These two are the parallel versions. An example:
set.seed(7*11*13)
n <- 100
a <- sample(1:1000, 100)
b <- sample(1:1000, 100)
c <- sample(1:1000, 100)
d <- sample(1:1000, 100)
mean( (a<b)&(b<c)&(c<d) )
CodePudding user response:
Using Rfast::coldiffs
.
mean(rowSums(Rfast::coldiffs(matrix(c(A, B, C, D), length(A), 4)) > 0) == 3)
#> [1] 0.0397
Or multiplication
mean((A<B)*(B<C)*(C<D))
For whatever reason, multiplying/adding logical tends to be faster than &
/|
in possibly every case I've encountered:
microbenchmark::microbenchmark(
coldiffs = mean(rowSums(Rfast::coldiffs(matrix(c(A, B, C, D), length(A), 4)) > 0) == 3),
logical = mean(A<B & B<C & C<D),
multiplication = mean((A<B)*(B<C)*(C<D)),
check = "identical"
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> coldiffs 253.701 384.6515 454.6581 402.551 429.6010 6795.401 100
#> logical 131.201 160.5505 200.4389 166.051 173.1515 3510.901 100
#> multiplication 100.601 122.5010 126.3041 126.001 132.2010 241.400 100
Data
A <- runif(1e4)
B <- runif(1e4)
C <- runif(1e4)
D <- runif(1e4)