Home > Software design >  sample from number ranges with given probabilities for the ranges
sample from number ranges with given probabilities for the ranges

Time:03-12

Is there a way to sample from number ranges with given probabilities for the ranges? This does not work because c() (of course) creates a vector of all the numbers.

set.seed(1)
sample(c(18:24, 25:34, 35:44, 45:54, 55:64),
       100,
       replace = TRUE,
       prob = c(0.2, 0.3, 0.35, 0.1, 0.05))

CodePudding user response:

What you need is for prob to be repeated for each value in each range, with an adjusted value proportional to the length of each range.

ranges <- list(18:24, 25:34, 35:44, 45:54, 55:64)
lens <- lengths(ranges)
probs <- c(0.2, 0.3, 0.35, 0.1, 0.05)
set.seed(1)
samp <- sample(unlist(ranges), size=1e6, replace=TRUE, prob=rep(probs / lens, times = lens))
table(cut(samp, c(17, 24, 34, 44, 54, 65)))
# (17,24] (24,34] (34,44] (44,54] (54,65] 
#  200258  299917  349943  100154   49728 

(Roughly the expected ratios.)

CodePudding user response:

I guess I would approach it like this:

set.seed(1)
r =list(18:24, 25:34, 35:44, 45:54, 55:64)
p =c(0.2, 0.3, 0.35, 0.1, 0.5)
sample(unlist(lapply(1:5, \(x) sample(r[[x]], size = 100*p[[x]], replace=T))))

This places the ranges in a list, and then along the length of that list, samples from each range in proportion of the total, given p. I resample at the end.

Output:

  [1] 19 23 25 59 38 19 34 23 34 46 55 60 22 57 61 22 41 56 42 57 43 59 56 41 19 33 47 57 64 43 34 30 57
 [34] 34 37 40 18 44 59 57 36 54 57 50 58 40 33 54 21 63 35 55 51 55 30 64 36 42 62 29 63 61 55 28 37 22
 [67] 18 34 40 41 62 33 60 61 26 35 24 58 31 35 61 63 62 46 20 41 58 33 19 30 29 59 27 39 29 41 33 64 40
[100] 64 42 63 23 60 28 20 28 24 56 56 35 54 40 20 62 54 32 24 33 61 45 55 29 57 32 41 60 56 31 55 41 18
[133] 40 60 40 43 37 57 33 37 30 42 62 38 64
  •  Tags:  
  • r
  • Related