Home > Enterprise >  R: How to generate the following data
R: How to generate the following data

Time:06-26

I need help with generating random numbers.

p - The probability that a pcr test of a sick person, will show us a positive result. q - The probability that a pcr test of a healthy person, will show us a positive result. s - The general percentage of patients in the population.

Let q = 10%, p = 95% and s = 20%

With this information, I need to generate data for 1000 people who came to be tested.

What I tried to do?

s = 0.2
p = 0.95
q = 0.1

true_result = c()
test_result = c()

for (i in 1:1000){
  x = runif(1,0,1)
  if (x<=s){
    true_result = append(true_result,1)
    r = runif(1,0,1)
    if (r<=p){
      test_result = append(test_result,1)
  
    }else{
      test_result = append(test_result,0)
    }
    
  }else{
    true_result = append(true_result,0)
    r = runif(1,0,1)
    if (r<=q){
      test_result = append(test_result,1)
    }
    else{
      test_result = append(test_result,0)
    }
  }
}



sick = sum(true_result)
tested_sick = sum(test_result)

print(sick)
print(tested_sick)

I'm not sure if this is true and I wanted to know maybe there is another way to do it?

CodePudding user response:

How about something like this:

# Sick and positive: .2*.95 = .19
# Sick and negative: .2*.05 = .01
# Healthy and positive: .8*.1=.08
# Healthy and negative: .8*.9 = .72

samp <- sample(1:4, 1000, replace=TRUE, prob=c(.19,.01,.08,.72))

stat <- c("sick", "sick", "healthy", "healthy")
tst <- c("positive", "negative", "positive", "negative")
test_data <- data.frame(
  health = stat[samp], 
  test_result = tst[samp]
)
head(test_data)
#>    health test_result
#> 1 healthy    negative
#> 2 healthy    negative
#> 3 healthy    negative
#> 4    sick    positive
#> 5    sick    positive
#> 6 healthy    negative

Created on 2022-06-25 by the reprex package (v2.0.1)

CodePudding user response:

Without making any calculations, but making use of vectorization, this can be simulated with just two lines of code:

#Setting the inputs
seed<-44
s<-0.2; q<-0.1; p<-0.95
niter<-10000

#Simulation:

sick<-rbinom(niter,1,s)
positive<-rbinom(niter, 1, ifelse(sick==0,q,p))

Let's take a look at the risults, which are consistent with @DaveArmstrong probabilities (you may have different results if you used another seed):

table(sick, positive)
#    positive
#sick    0    1
#   0 7260  769
#   1   87 1884
  • Related