I need help with generating random numbers.
p - The probability that a pcr test of a sick person, will show us a positive result. q - The probability that a pcr test of a healthy person, will show us a positive result. s - The general percentage of patients in the population.
Let q = 10%, p = 95% and s = 20%
With this information, I need to generate data for 1000 people who came to be tested.
What I tried to do?
s = 0.2
p = 0.95
q = 0.1
true_result = c()
test_result = c()
for (i in 1:1000){
x = runif(1,0,1)
if (x<=s){
true_result = append(true_result,1)
r = runif(1,0,1)
if (r<=p){
test_result = append(test_result,1)
}else{
test_result = append(test_result,0)
}
}else{
true_result = append(true_result,0)
r = runif(1,0,1)
if (r<=q){
test_result = append(test_result,1)
}
else{
test_result = append(test_result,0)
}
}
}
sick = sum(true_result)
tested_sick = sum(test_result)
print(sick)
print(tested_sick)
I'm not sure if this is true and I wanted to know maybe there is another way to do it?
CodePudding user response:
How about something like this:
# Sick and positive: .2*.95 = .19
# Sick and negative: .2*.05 = .01
# Healthy and positive: .8*.1=.08
# Healthy and negative: .8*.9 = .72
samp <- sample(1:4, 1000, replace=TRUE, prob=c(.19,.01,.08,.72))
stat <- c("sick", "sick", "healthy", "healthy")
tst <- c("positive", "negative", "positive", "negative")
test_data <- data.frame(
health = stat[samp],
test_result = tst[samp]
)
head(test_data)
#> health test_result
#> 1 healthy negative
#> 2 healthy negative
#> 3 healthy negative
#> 4 sick positive
#> 5 sick positive
#> 6 healthy negative
Created on 2022-06-25 by the reprex package (v2.0.1)
CodePudding user response:
Without making any calculations, but making use of vectorization, this can be simulated with just two lines of code:
#Setting the inputs
seed<-44
s<-0.2; q<-0.1; p<-0.95
niter<-10000
#Simulation:
sick<-rbinom(niter,1,s)
positive<-rbinom(niter, 1, ifelse(sick==0,q,p))
Let's take a look at the risults, which are consistent with @DaveArmstrong probabilities (you may have different results if you used another seed):
table(sick, positive)
# positive
#sick 0 1
# 0 7260 769
# 1 87 1884