I have 3 classification rarity with probability something like this
class S has 10% probability
class A has 30% probability
class B has 60% probability
So i code like this
pool = ["S", "A", "A", "A", "B", "B", "B", "B", "B", "B"]
11.times do
puts pool[rand(10) - 1]
end
and the result is quite correct in my guest (cmiiw)
B
A
B
B
S
B
A
S
B
A
B
but i become confuse when i should add more class and change S class probability to 1%
pool now become
class S has 1% probability
class A has 29% probability
class B has 30% probability
class C has 40% probability
i am not sure i should create pool like pool
variable before because 1% is 1/10 is not integer.
Kindly need help, thanks!
CodePudding user response:
Consider using the aliastable gem. It’s a ruby implementation of the alias method published by A.J. Walker back in 1974.
The alias method uses O(n) time to set up the table with n outcomes, then O(1) time per value to generate. By contrast, setting up a CDF and then doing a binary search for values would take O(n) and O(log n) time respectively.
This little demo:
require 'aliastable'
values = %w(S A B C)
probs = [1r/100, 29r/100, 3r/10, 2r/5] # r => Rational
my_distribution = AliasTable.new(values, probs)
results = Array.new(1_000_000) { my_distribution.generate }
p results.tally
produces results such as
{"C"=>400361, "B"=>300121, "A"=>289636, "S"=>9882}
CodePudding user response:
pool = 100.times.map do
r = rand
if r <= 0.01
'S'
elsif r <= 0.30
'A'
elsif r <= 0.6
'B'
else
'D'
end
end
p pool
p pool.tally
This would output something like
["D", "B", "D", ....]
{"D"=>39, "B"=>28, "A"=>31, "S"=>2}
You could also force rand
to return an Integer:
r = rand(0..100)
and then check for integers
if r <= 1
or use a case statement and check for ranges like in the in the answer linked by Stefan.
CodePudding user response:
Just for fun I adpated @Pascal's answer to a class with an Enumerator
class ProbabilityGenerator
include Enumerable
attr_reader :probabilities
def initialize(probabilities)
raise ArgumentError unless probabilities.values.sum == 1.0
@probabilities = prioritize(probabilities)
@enum = Enumerator.new do |y|
v = rand
loop do
y << @probabilities.find {|element, stat| v <= stat}.first
v = rand
end
end
end
def each(&block)
@enum.each(&block)
end
def next
@enum.next
end
private
def prioritize(probabilities)
probabilities
.transform_values(&:to_r)
.sort_by {|k,v| v}
.tap {|a| a.each_cons(2) {|b,c| c[1] = b[1] c[1]} }
.to_h
end
end
You can use as follows:
pb = ProbabilityGenerator.new(s: 0.01, b: 0.6, a: 0.3, d: 0.09)
pb.take(10)
#=> [:a, :b, :d, :d, :b, :b, :b, :b, :d, :b]
# or just keep generating numbers
pb.next
#=> :a
pb.next
#=> :b
Running the same deterministics as @pjs
pb.take(1_000_000).tally
# => {:a=>300033, :b=>600362, :d=>89687, :s=>9918}