Ruby - Pick one element from array by possibility-CodePudding

I have an array with 3 elements and I want to pick one and add that into another array base on possibility.

For example, num 1 has 5% chance to be picked, num 2 has 60% chance to be picked and num 3 has 35% chance to be picked.

arr = [{:num=>1, :diff=>-29}, {:num=>2, :diff=>5}, {:num=>3, :diff=>25}]

I found below methods from stackoverflow, just wondering if this would work? Or there is another way to do it?

def get_num(arr)
    case rand(100)   1
    when 1..5
        p arr[0]
    when 6..65
        p arr[1]
    when 66..100
        p arr[2]
    end
end

get_num(arr)

Thanks!

CodePudding user response：

Your code is fine but I'd probably write it as follows.

CDF = [[0.05,0], [0.65,1], [1,2]]

def get_num(arr)
  n = rand
  arr[CDF.find { |mx,_| n <= mx }.last]
end

arr = [{:num=>1, :diff=>-29}, {:num=>2, :diff=>5}, {:num=>3, :diff=>25}]

get_num(arr)
  #=> {:num=>2, :diff=>5}
get_num(arr)
  #=> {:num=>2, :diff=>5}
get_num(arr)
  #=> {:num=>3, :diff=>25}
get_num(arr)
  #=> {:num=>1, :diff=>-29}
get_num(arr)
  #=> {:num=>2, :diff=>5}

n = 1_000
n.times
 .with_object(Hash.new(0)) { |_,h| h[get_num(arr)]  = 1 }
 .transform_values { |v| v.fdiv(n) }
  #=> {{:num=>2, :diff=>5} =>0.612,
  #    {:num=>3, :diff=>25}=>0.328,
  #    {:num=>1, :diff=>-29}=>0.06}

n = 100_000
n.times
 .with_object(Hash.new(0)) { |_,h| h[get_num(arr)]  = 1 }
 .transform_values { |v| v.fdiv(n) } 
  #=> {{:num=>3, :diff=>25} =>0.34818,
  #    {:num=>1, :diff=>-29}=>0.04958,
  #    {:num=>2, :diff=>5}  =>0.60224}

Notice that if many hashes are randomly drawn the fraction of each hash drawn approaches its specified population probability as the sample size is increased.

"CDF" stands for cumulative distribution function.

Here is another way that is more efficient (because it does not use find, which performs a linear search) but uses more memory.

CHOICE = Array.new(100) do |i|
                          if i < 5
                            0
                          elsif i < 65
                            1
                          else
                            2
                          end
                        end
  #=> [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  #    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  #    2, 2, 2, 2, 2]

def get_num(arr)
  arr[CHOICE[rand(100)]]
end

  #=> {{:num=>2, :diff=>5} =>0.60029,
  #    {:num=>3, :diff=>25}=>0.35022,
  #    {:num=>1, :diff=>-29}=>0.04949}

CodePudding user response：

Here's a small variation / addition to Cary's great answer.

Instead of calculating the cumulative sums yourself, you can let Ruby build it for you out of the initial probabilities:

probs = [5, 60, 35]

sum = 0
sums = probs.map { |x| sum  = x }
#=> [5, 65, 100]

we can now calculate a random number between 0 and the total sum and find the corresponding index:

r = rand(sum)                  #=> 37
sums.find_index { |i| r < i }  #=> 1

Note that the initial probabilities don't have to sum to 100. instead of [5, 60, 35] you could also use:

probs = [1, 12, 7]

You can wrap the above code into a method:

def random_index(*probs)
  sum = 0
  sums = probs.map { |x| sum  = x }
  r = rand(sum)
  sums.find_index { |i| r < i }
end

random_index(5, 60, 35) #=> 1
random_index(5, 60, 35) #=> 1
random_index(5, 60, 35) #=> 2

You could also make the method return a proc / lambda that can be reused:

def random_index_proc(*probs)
  sum = 0
  sums = probs.map { |x| sum  = x }
  -> {
    r = rand(sum)
    sums.find_index { |i| r < i }
  }
end

prc = random_index_proc(5, 60, 35)

prc.call #=> 1
prc.call #=> 1
prc.call #=> 0

Last not least, you can also pre-populate an array this way: (using Cary's naming convention)

CHOICE = [5, 60, 35].flat_map.with_index { |v, i| [i] * v }

and get a random element via:

def get_num(arr)
  arr[CHOICE.sample]
end

To keep the array small, you should prefer [1, 12, 7] (20 elements) over [5, 60, 35] (100 elements). With a little help from gcd you don't even have to calculate it yourself:

probs = [5, 60, 35]

gcd = probs.reduce { |a, b| a.gcd(b) }
#=> 5

probs.map { |i| i / gcd }
#=> [1, 12, 7]