Is there a way to calculate probability with probability tree instead of simulation-CodePudding

I want to implement a function in python that gives me the exact probability (if possible in fraction form) of my following problem:

You have a list of 8 elements let's say l=[1,2,3,4,5,6,7,8], then you take succesively k number in that list, if the number is 1,2,3 or 4 then you take it off that list, otherwise you let that list as it is. The probability of having every number is equivalent.

I want to calculate the probability, within these n tries, to have at least the numer 1 and the probability to have 1 and 2.

For n=1 the probability to have 1 is 1/8

For n=2 the probability to have 1 is 1/8 4/8 * 1/8 3/8 * 1/7= 27/112

to have 1 and 2 is 2 * 1/8 * 1/7 = 1/28

etc..

However as I can't formalize that probability into a formula. I tried to calculated it with an algorithm but it isn't simple.

I was able to simulate it in order to have an approximation but I'm not really satisfied with it.

def amalsimu2(adapt,n=int(1e6)):
    prob_to_find_both=0
    prob_to_find_one=0
    for _ in range(n):
        l=[i for i in range(1,9)]
        rec=[]
        for _ in range(adapt):
            draw=l[rd.randint(0,len(l)-1)]
            rec.append(draw)
            if draw<5:
                l.remove(draw)
        if 1 in rec:
            prob_to_find_one =1
            if 2 in rec:
                prob_to_find_both =1
    return [round(prob_to_find_one/n*100,3), round(n/prob_to_find_one,3)],[round(prob_to_find_both/n*100,3), round(n/prob_to_find_both,3)]

I looked a little bit into tree in python but I don't know if it is a good way to process.

If you have any idea on how to formulize the probability I want to compute or if you have a good idea on how to process to make it with a python algorithm, I would really appreciate that

CodePudding user response：

It's more like a probability graph than a tree, but that depends on how you define a state (a node in the graph). The code below defines "a state" as the tuple of integers that can still be chosen. The great advantage is that there are only 16 possible states then (depending on which subset of {1, 2, 3, 4} has already been picked), and the fewer the states the faster this goes.

The disadvantage is that having so few possible states constrains questions that can be answered. For example, no state records whether a 5 has ever been picked. But I don't know what you meant by "etc.", and both your description and your code only asked about prob_to_find_one and prob_to_find_both, The states used here are sufficient to answer those.

def crunch():
    from fractions import Fraction
    from collections import defaultdict

    step = 0
    state2prob = {tuple(range(1, 9)) : Fraction(1)}
    while True:
        step  = 1
        new_state2prob = defaultdict(Fraction)
        for state, prob in state2prob.items():
            nchoices = len(state)
            for choice in state:
                newstate = state
                if 1 <= choice <= 4:
                    newstate = list(state)
                    newstate.remove(choice)
                    newstate = tuple(newstate)
                new_state2prob[newstate]  = prob / nchoices
        state2prob = new_state2prob
        assert sum(state2prob.values()) == 1
        prob1 = prob12 = Fraction(0)
        for state, prob in state2prob.items():
            if 1 not in state:
                prob1  = prob
                if 2 not in state:
                    prob12  = prob
        print(f"{step=}")
        print(f"    {prob1=} {float(prob1):7.2%}")
        print(f"    {prob12=} {float(prob12):7.2%}")

The output starts like so:

step=1
    prob1=Fraction(1, 8)  12.50%
    prob12=Fraction(0, 1)   0.00%
step=2
    prob1=Fraction(27, 112)  24.11%
    prob12=Fraction(1, 28)   3.57%
step=3
    prob1=Fraction(545, 1568)  34.76%
    prob12=Fraction(115, 1176)   9.78%
step=4
    prob1=Fraction(146201, 329280)  44.40%
    prob12=Fraction(43739, 246960)  17.71%
step=5
    prob1=Fraction(36652993, 69148800)  53.01%
    prob12=Fraction(13765027, 51861600)  26.54%
step=6
    prob1=Fraction(8796724649, 14521248000)  60.58%
    prob12=Fraction(3876980411, 10890936000)  35.60%
step=7
    prob1=Fraction(2047820152657, 3049462080000)  67.15%
    prob12=Fraction(1015122839923, 2287096560000)  44.38%
step=8
    prob1=Fraction(466169430547001, 640387036800000)  72.79%
    prob12=Fraction(252512187968939, 480290277600000)  52.57%
step=9
    prob1=Fraction(104336675177661793, 134481277728000000)  77.58%
    prob12=Fraction(60499089868078627, 100860958296000000)  59.98%

Note that while the Fraction type automatically converts to lowest terms, the numerators and denominators grow large quickly. This is exact rational arithmetic.