Home > Mobile >  Having trouble solving simulation
Having trouble solving simulation

Time:12-05

I got a question related to probability theory and I tried to solve it by simulating it in R. However, I ran into a problem as the while loop does not seem to break.

The question is asking: How many people are needed such that there is at least a 70% chance that one of them is born on the last day of December?

Here is my code:

prob <- 0 
people <- 1 

while (prob <= 0.7) {
  people <- people   1 #start the iteration with 2 people in the room and increase 1 for every iteration
  birthday <- sample(365, size = people, replace = TRUE) 
  prob <- length(which(birthday == 365)) / people
}
return(prob)

My guess is that it could never hit 70%, therefore the while loop never breaks, am I right? If so, did I interpret the question wrongly?

I did not want to post this on stats.stackexchange.com because I thought this is more related to code rather than math itself, but I will move it if necessary, thanks.

CodePudding user response:

You are solving the wrong problem. The question is, "How many people are needed such that there is at least a 70% chance that one of them is born on the last day of December?". What you are finding now is "How many people are needed such that 70% have their birthdays on the last day of December?". The answer to the second question is close to zero. But the first one is much simpler.

Replace prob <- length(which(birthday == 365)) / people with check = any(birthday == 365) in your logic because at least one of them has to be born on Dec 31. Then, you will be able to find if that number of people will have at least one person born on Dec 31.

After that, you will have to rerun the simulation multiple times to generate empirical probability distribution (kind of Monte Carlo). Only then you can check for probability.

Simulation Code

people_count = function(i)
{
  set.seed(i)
  for (people in 1:10000)
  {
    birthday = sample(365, size = people, replace = TRUE)
    check = any(birthday == 365)
    if(check == TRUE)
    {
      pf = people
      break
    }
  }
  return(pf)
}

people_count() function returns the number of people required to have so that at least one of them was born on Dec 31. Then I rerun the simulation 10,000 times.

# Number of simulations
nsim = 10000
l = lapply(1:nsim, people_count) %>%
  unlist()

Let's see the distribution of the number of people required.

histogram of empirical distribution

To find actual probability, I'll use cumsum().

> cdf = cumsum(l/nsim)
> which(cdf>0.7)[1]
[1] 292

So, on average, you would need 292 people to have more than a 70% chance.

CodePudding user response:

Indeed, your probability will (almost) never reach 0.7, because you hardly will hit the point where exactly 1 person has got birthday = 365. When people gets larger, there will be more people having a birthday = 365, and the probability for exactly 1 person will decrease.

Furthermore, to calculate a probability for a given number of persons, you should draw many samples and then calculate the probability. Here is a way to achieve that:

N = 450  # max. number of peoples being tried
probs = array(numeric(), N)  # empty array to store found probabilities

# try for all people numbers in range 1:N
for(people in 1:N){
  # do 200 samples to calculate prop
  samples = 200
  successes = 0
  for(i in 1:samples){
    birthday <- sample(365, size = people, replace = TRUE)
    total_last_day <- sum(birthday == 365)
    if(total_last_day >= 1){
      successes <- successes   1
    }
  }
  # store found prop in array
  probs[people] = successes/samples
}

# output of those people numbers that achieved a probability of > 0.7
which(probs>0.7)

As this is a simulation, the result depends on the run. Increasing the sample rate would make the result more stable.

CodePudding user response:

This is a case where an analytical solution based on probability is easier and more accurate than trying to simulate. I agree with Harshvardhan that your formulation is solving the wrong problem.

The probability of having at least one person in a pool of n have their birthday on a particular target date is 1-P{all n miss the target date}. This probability is at least 0.7 when P{all n miss the target date} < 0.3. The probability of each individual missing the target is assumed to be P{miss} = 1-1/365 (365 days per year, all birthdates equally likely). If the individual birthdays are independent, then P{all n miss the target date} = P{miss}^n.

I am not an R programmer, but the following Ruby should translate pretty easily:

# Use rationals to avoid cumulative float errors.
# Makes it slower but accurate.
P_MISS_TARGET = 1 - 1/365r   
p_all_miss = P_MISS_TARGET
threshold = 3r / 10   # seeking P{all miss target} < 0.3
n = 1
while p_all_miss > threshold
  p_all_miss *= P_MISS_TARGET
  n  = 1
end
puts "With #{n} people, the probability all miss is #{p_all_miss.to_f}"

which produces:

With 439 people, the probability all miss is 0.29987476838793214

  • Related