I am trying to generate three probabilities that will necessarily sum to 1. I tried the following:
x <- sample(seq(0.05,0.95,0.05), size=3)
x <- boom/sum(boom)
However, I want the generated numbers to be fractions to two decimal places, i.e, numbers from 0.05 to 0.95 in steps of 0.05 (and so that they also sum to 1). The problem with the numbers generated by the code above is that they can be with many or infinite decimal places. How can I achieve what I need? Has anyone done anything similar?
CodePudding user response:
You could do
diff(c(0, sort(sample(seq(0.05, 0.95, 0.05), 2)), 1))
#> [1] 0.05 0.75 0.2
This works by choosing 2 random double-digit numbers between 0.05 and 0.95 and sorting them. The first number is the first sample, the second is the distance between the two numbers, and the third is the distance between the second number and 1, so they necessarily add to 1.
Note that if all the numbers have to add up to 1, and none can be lower than 0.05, this means none of the numbers can be 0.95.
For example, if we want ten such samples, we can do:
t(replicate(10, diff(c(0, sort(sample(seq(0.05, 0.95, 0.05), 2)), 1))))
#> [,1] [,2] [,3]
#> [1,] 0.25 0.15 0.60
#> [2,] 0.25 0.25 0.50
#> [3,] 0.30 0.05 0.65
#> [4,] 0.25 0.50 0.25
#> [5,] 0.50 0.05 0.45
#> [6,] 0.45 0.20 0.35
#> [7,] 0.10 0.85 0.05
#> [8,] 0.45 0.50 0.05
#> [9,] 0.15 0.40 0.45
#> [10,] 0.40 0.30 0.30
Created on 2022-06-17 by the reprex package (v2.0.1)
CodePudding user response:
You can make many distributions over the discrete simplex you describe. Here is an alternative and generalizable way of generating a uniform distribution across all triples (p1, p2, p3) in your simplex:
p <- seq(0.05, 0.95, by = 0.05)
states <- expand.grid(p, p, p) # Generate all combinations
states <- states[rowSums(round(states, 2)) == 1, ] # Select simplex where elements sum to 1
states[sample(1:nrow(states), 1), ] # Sample a random element
This should work well for triples, but if you want to simulate from higher order simplexes (e.g. 7 probabilities or more) one should probably construct the data frame of states in a more efficient way.
CodePudding user response:
Try this
fn <- function(){
s <- seq(0.05,0.90,0.05)
sm <- sample(s , size = 2)
while(sm[1] sm[2] >= 1) sm <- sample(s , size = 2)
ans <- c(sm , 1 - sum(sm))
ans
}
by calling fn()
you get the required sample
fn()
#> 0.20 0.65 0.15
fn()
#> 0.40 0.55 0.05