Ok...
I have 33 patients with each two legs (0 and 1).
I want to create a random sample of 33 legs but NOT with left and right leg of one patient
I tried the following (small example):
library(janitor)
data<-list()
df_HS<-data.frame()
data$x<-c(1,1,2,2,3,3,4,4,5,5,6,6)
data$y<-c(0,1,0,1,0,1,0,1,0,1,0,1)
df<-data.frame(data)
# x is subjectID
# y is leg (0=Left; 1=Right)
k=0
for(i in unique(df$x)){
k=k 1
stratdf<-df[df$x==i,]
df_HS[k 1,1:ncol(stratdf)] <- stratdf[sample(nrow(stratdf), size=1), ]
}
df_HS<-df_HS[-1,]
tabyl(df_HS$y)
df_HS$y n percent
0 4 0.6666667
1 2 0.3333333
However, I want to have 3 zeros and 3 ones every time I run this script, or at max one different (in case of uneven samples e.g. 5 patients).
This is a small example, the actual dataset is bigger.
Thanks
CodePudding user response:
If each patient can contribute only one leg, and you want as equal as possible a number of L and R, what I would do is
- Randomise the order of patients (if you want only a subset, specify the number in
sample
)
n <- 33
patients <- 1:n
scramble <- sample(patients)
- Use the L leg of the first half of the patients, and R of the second half
leg <- c(rep(0, ceiling(n/2)), rep(1, floor(n/2)))
- Arrange into a data frame and order by patient
df <- data.frame(patient = scramble, leg)[order(scramble),]
head(df)
patient leg
7 1 0
27 2 1
9 3 0
25 4 1
3 5 0
15 6 0
CodePudding user response:
Here's my attempt that keeps the number of 0 and 1 the same.
# Use dplyr for convenience
library(dplyr)
df <- structure(list(
ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6),
Leg = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
),
class = "data.frame",
row.names = c(NA,-12L))
sample_size <- 6
# Choose a random sample of IDs
ids <- sample(unique(df$ID), size = sample_size, replace=F)
# Create an alternating set of left and right legs
# so that the running sum is near 0
legs <- rep(c(0,1), sample_size/2)
if (length(leg) < sample_size) {
legs <- c(legs, ifelse(runif(1) > 0.5, 1, 0))
}
# Because the ids are selected randomly, the allocation of left or right
# will be random
result <- data.frame(ID=ids, Leg=legs) %>% arrange(ID)
tabyl(result$Leg)
#result$Leg n percent
#0 3 0.5
#1 3 0.5
I sample the IDs randomly then allocate a sequence of 0 and 1 to these. The random IDs ensure the legs are allocated randomly.