Random number between 0 and 1 for 33 times but equal zeros and ones or max 1 off in case of uneven n-CodePudding

Ok...

I have 33 patients with each two legs (0 and 1).

I want to create a random sample of 33 legs but NOT with left and right leg of one patient

I tried the following (small example):

library(janitor)
data<-list()
df_HS<-data.frame()
data$x<-c(1,1,2,2,3,3,4,4,5,5,6,6)
data$y<-c(0,1,0,1,0,1,0,1,0,1,0,1)
df<-data.frame(data)

# x is subjectID
# y is leg (0=Left; 1=Right)

k=0
for(i in unique(df$x)){
    k=k 1
    stratdf<-df[df$x==i,]
    df_HS[k 1,1:ncol(stratdf)] <- stratdf[sample(nrow(stratdf), size=1), ]
}
df_HS<-df_HS[-1,]
tabyl(df_HS$y)


df_HS$y n   percent
    0 4 0.6666667
    1 2 0.3333333

However, I want to have 3 zeros and 3 ones every time I run this script, or at max one different (in case of uneven samples e.g. 5 patients).

This is a small example, the actual dataset is bigger.

Thanks

CodePudding user response：

If each patient can contribute only one leg, and you want as equal as possible a number of L and R, what I would do is

Randomise the order of patients (if you want only a subset, specify the number in sample)

n <- 33
patients <- 1:n
scramble <- sample(patients)

Use the L leg of the first half of the patients, and R of the second half

leg <- c(rep(0, ceiling(n/2)), rep(1, floor(n/2)))

Arrange into a data frame and order by patient

df <- data.frame(patient = scramble, leg)[order(scramble),]

head(df)
   patient leg
7        1   0
27       2   1
9        3   0
25       4   1
3        5   0
15       6   0

CodePudding user response：

Here's my attempt that keeps the number of 0 and 1 the same.

# Use dplyr for convenience
library(dplyr)
df <- structure(list(
    ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6),
    Leg = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
),
class = "data.frame",
row.names = c(NA,-12L))

sample_size <- 6
# Choose a random sample of IDs
ids <- sample(unique(df$ID), size = sample_size, replace=F)
# Create an alternating set of left and right legs 
# so that the running sum is near 0
legs <- rep(c(0,1), sample_size/2) 
if (length(leg) < sample_size) {
    legs <- c(legs, ifelse(runif(1) > 0.5, 1, 0))
}

# Because the ids are selected randomly, the allocation of left or right 
# will be random
result <- data.frame(ID=ids, Leg=legs) %>% arrange(ID)

tabyl(result$Leg)
#result$Leg n percent
#0 3     0.5
#1 3     0.5

I sample the IDs randomly then allocate a sequence of 0 and 1 to these. The random IDs ensure the legs are allocated randomly.