Home > OS >  How to sample a binary output with a fixed number of events (i.e. 1) in R?
How to sample a binary output with a fixed number of events (i.e. 1) in R?

Time:08-03

Suppose I have a data frame like:

set.seed(123)
df <- data.frame(x=rbinom(100,1,0.9), y=rbinom(100,1,0.95))

What I wanted is to sample a subset,df_sub, from df where the number of rows with both x==1 and y==1 equals 5 regardless the total number of rows of df_sub like:

## index <- sample(1:nrow(df),..,replace = FALSE)
df_sub <- df[index,]
df_sub
    x y
1   1 1
2   1 1
3   1 1
4   1 0
5   0 1
6   1 1
7   1 1

As you can see, in the df_sub, the number of rows with x==1 & y==1 equals 5 while the total number of rows equals 7. I would like to sample the original df with fixed number of 5 with x==1 & y==1 regardless the actual number of row of df_sub.

CodePudding user response:

We may use rep with sample

n_events <- 20
total_len <- 70
n_zero_events <- total_len - n_events
v1 <- sample(rep(c(1, 0), c(n_events, n_zero_events)))
> sum(v1)
[1] 20

CodePudding user response:

A base R one-liner using sample rep replace

> sample(replace(rep(0, 100), 1:20, 1))
  [1] 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
 [38] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
 [75] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0
  • Related