Is there a way to sample from number ranges with given probabilities for the ranges? This does not work because c()
(of course) creates a vector of all the numbers.
set.seed(1)
sample(c(18:24, 25:34, 35:44, 45:54, 55:64),
100,
replace = TRUE,
prob = c(0.2, 0.3, 0.35, 0.1, 0.05))
CodePudding user response:
What you need is for prob
to be repeated for each value in each range, with an adjusted value proportional to the length of each range.
ranges <- list(18:24, 25:34, 35:44, 45:54, 55:64)
lens <- lengths(ranges)
probs <- c(0.2, 0.3, 0.35, 0.1, 0.05)
set.seed(1)
samp <- sample(unlist(ranges), size=1e6, replace=TRUE, prob=rep(probs / lens, times = lens))
table(cut(samp, c(17, 24, 34, 44, 54, 65)))
# (17,24] (24,34] (34,44] (44,54] (54,65]
# 200258 299917 349943 100154 49728
(Roughly the expected ratios.)
CodePudding user response:
I guess I would approach it like this:
set.seed(1)
r =list(18:24, 25:34, 35:44, 45:54, 55:64)
p =c(0.2, 0.3, 0.35, 0.1, 0.5)
sample(unlist(lapply(1:5, \(x) sample(r[[x]], size = 100*p[[x]], replace=T))))
This places the ranges in a list, and then along the length of that list, samples from each range in proportion of the total, given p
. I resample at the end.
Output:
[1] 19 23 25 59 38 19 34 23 34 46 55 60 22 57 61 22 41 56 42 57 43 59 56 41 19 33 47 57 64 43 34 30 57
[34] 34 37 40 18 44 59 57 36 54 57 50 58 40 33 54 21 63 35 55 51 55 30 64 36 42 62 29 63 61 55 28 37 22
[67] 18 34 40 41 62 33 60 61 26 35 24 58 31 35 61 63 62 46 20 41 58 33 19 30 29 59 27 39 29 41 33 64 40
[100] 64 42 63 23 60 28 20 28 24 56 56 35 54 40 20 62 54 32 24 33 61 45 55 29 57 32 41 60 56 31 55 41 18
[133] 40 60 40 43 37 57 33 37 30 42 62 38 64