I apologize in advance for any confusing explanations, but I will try to be as clear as possible.
If there are multiple indicators that predict an outcome with a known accuracy, and they are all attempting to predict the same result, how do you properly add the probabilities?
For example, if John and David are taking a test, and historically John answers 80% of questions correctly, and David answers 75% of questions correctly, and both John and David select the same answer on a question, what is the probability that they are correct? Let's assume that John and David are completely independent of each other and that all questions are equally difficult.
I would think that the probability that they are correct is higher than 80%, so I don't think averaging makes sense.
CodePudding user response:
Thank you to Robert who commented on this question, I was able to figure out that what I was looking for is a well-known problem solved by Bayes Theorem, which is used to re-evaluate existing probabilities given new information. I won't go further into the intuition behind it but 3Blue1Brown has a very good video on the topic.
Bayes Theorem states: P(A|B) = (P(A)*P(B)) / (P(A)*P(B) P(!A)*P(!B))
Where: P(A) is probability 1, P(!A) is 1 - P(A), P(B) is probability 2, and P(!B) is 1 - P(B)
Using this equation in the scenario in the question, if John has an 80% chance of being right and David has a 75% chance of being right, and both agree, then the chance that they are both correct is 92.3%.
To prove this, I wrote a simple python script that simulates this exact scenario n times and prints out the result. In this code, two "experts" have a set probability of being true or false, and their accuracy is tracked individually and together.
import random
TRIALS = 1000000
exp1_correct = 0
exp2_correct = 0
combined_correct = 0
consensus_count = 0
for i in range(TRIALS):
expert1 = random.random() <= 0.8
expert2 = random.random() <= 0.75
if expert1 and expert2:
combined_correct = 1
if expert1:
exp1_correct = 1
if expert2:
exp2_correct = 1
if expert1 == expert2:
consensus_count = 1
print(f'Expert 1 had an accuracy of {exp1_correct / TRIALS}')
print(f'Expert 2 had an accuracy of {exp2_correct / TRIALS}')
print(f'Consensus had an accuracy of {combined_correct / consensus_count}')
Running this verifies that the equation above is correct. Hopefully this is helpful to someone that has the same question that I did!