Suppose I have a data frame like:
set.seed(123)
df <- data.frame(x=rbinom(100,1,0.9), y=rbinom(100,1,0.95))
What I wanted is to sample a subset,df_sub
, from df
where the number of rows with both x==1
and y==1
equals 5 regardless the total number of rows of df_sub
like:
## index <- sample(1:nrow(df),..,replace = FALSE)
df_sub <- df[index,]
df_sub
x y
1 1 1
2 1 1
3 1 1
4 1 0
5 0 1
6 1 1
7 1 1
As you can see, in the df_sub, the number of rows with x==1
& y==1
equals 5
while the total number of rows equals 7
. I would like to sample the original df
with fixed number of 5
with x==1
& y==1
regardless the actual number of row of df_sub
.
CodePudding user response:
We may use rep
with sample
n_events <- 20
total_len <- 70
n_zero_events <- total_len - n_events
v1 <- sample(rep(c(1, 0), c(n_events, n_zero_events)))
> sum(v1)
[1] 20
CodePudding user response:
A base R one-liner using sample
rep
replace
> sample(replace(rep(0, 100), 1:20, 1))
[1] 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
[75] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0