Home > other >  Simulate attrition in R
Simulate attrition in R

Time:10-26

I simulated the following general "survey experiment" data:

n <- 100
df <- data.frame(
Q1 = sample(c(18:90), n, rep = TRUE), #age
Q2 = sample(c("m", "f"), n, rep = TRUE), #sex
Q3 = sample(c(0,1), n, rep = TRUE, prob = c(0.55, 0.45)), #other general pre-treatment questions
Q4 = sample(c(0,1), n, rep = TRUE),
Q5 = sample(c(0,1), n, rep = TRUE), #treatment
Q6 = sample(c(0,1), n, rep = TRUE), #post-treatment
Q7 = sample(c(0,1), n, rep = TRUE),
Q8 = sample(c(0,1), n, rep = TRUE),
Q9 = sample(c(0,1), n, rep = TRUE),
Q10 = sample(c(0,1), n, rep = TRUE))

I'd like to simulate attrition (NA) data randomly. The following query deals with a similar issue: How do I add random `NA`s into a data frame

However, I'm interested in generating data that simulates respondents who left the survey completely, this may look something like this:

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
18  m  1  0  NA NA NA NA NA NA
30  f NA  NA NA NA NA NA NA NA
25  f  1  0  1  0  NA NA NA NA

Thanks!

CodePudding user response:

With Base R,

invisible(
sapply(1:nrow(df),function(x) {
    a <- sample(3:10,1)
    df[x,a:ncol(df)] <<- NA
}
))

head(df)

gives,

  Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
1 29  f  1  1  1  0 NA NA NA  NA
2 59  f NA NA NA NA NA NA NA  NA
3 48  m  1  0 NA NA NA NA NA  NA
4 38  m  0  1  0 NA NA NA NA  NA
5 30  f  1  1  0  0 NA NA NA  NA
6 57  m  1  1  1  1  0 NA NA  NA
  •  Tags:  
  • r na
  • Related