Random sampling in R without direct repetition and exact quantity of each number-CodePudding

How can I randomly sample the color order of 368 images using 4 colors that

should not be repeated directly ("red" "red" "blue" would not be ok, but "red" "blue" "red" would be)
should each appear with an equal quantity (each 92 times because 368/4 = 92)?

Based on this, I have already managed the sampling without direct repetition:

library("dplyr")
set.seed(340)
values <- c("blue", "red", "green", "yellow")
len <- 368 # number of samples
samp <- sample(values, 1) # initialise variable
cols <- sapply(2:len, function(i) samp[i] <<- sample(setdiff(values, samp[i-1]), 1, replace = TRUE))
table(cols) # colors appear 94, 92, 88, 93 times

I tried building a for-loop that samples until the exact numbers are reached with if(table(cols)[1:4] == 92), but it didn't work and after doing a lot of research, I still don't know how to proceed. I would be really thankful for tips and help!

CodePudding user response：

You can try this. The idea is to create the sorted vector first (seqc). Then for each iteration, the algorithm sample one value out of the possible values (i.e. all except the previous one in the vector).

seqc <- rep(values, each = 92)
vec <- sample(seqc, 1)
seqc <- seqc[-match(vec, seqc)]
for (i in 2:368){
  vec[i] <- sample(seqc[seqc != vec[i - 1]], 1)
  seqc <- seqc[-match(vec[i], seqc)]
}

output

head(vec)
# [1] "red"    "blue"   "red"    "yellow" "blue"   "yellow"

table(vec)
#vec
#  blue  green    red yellow 
#    92     92     92     92

It might throw an error, because the algorithm might not work as expected. In that case, rerun it until it works; it usually takes no more than 3 iterations for it to work.

CodePudding user response：

Since we must have 92 elements of each color, I'd say we can just put them all together in one vector, shuffle them, and then solve the conflicts (i.e. colors next to each other) by putting duplicated color elsewhere (e.g. at the end of the vector).

set.seed(1) 

# Declare all the colors we need
colors_total=rep(c("red","blue","green","yellow"),each=92)
# Shuffle them
col_vector=sample(colors_total)

# Create a vector to know if each color has a duplicate on the right
side_by_side<-(col_vector[-length(col_vector)]==col_vector[-1])
# Find the first duplicate
index_first_side_by_side=which(side_by_side)

# While there are duplicates, do:
while(length(index_first_side_by_side)>0){
  # Get the color to put elsewhere
  duplicated_color=col_vector[index_first_side_by_side]
  
  # Check the colors from the duplicate element to the end of col_vector
  # If it is always the same unique color, we cannot just put the duplicate element in the end
  # in order to avoid infinite loop
  if(length(unique(col_vector[index_first_side_by_side:length(col_vector)]))==1){
    # Decide a random place to put the duplicate
    r=runif(n=1,min=1,max=index_first_side_by_side-1)
    # Remove it from col_vector
    col_vector=c(col_vector[-index_first_side_by_side])
    # Add it at position r
    col_vector=c(col_vector[1:(r-1)],
                 duplicated_color,
                 col_vector[(r 1):length(col_vector)])
  } else { 
    # Otherwise can put duplicate in the end of vector
    col_vector=c(col_vector[-index_first_side_by_side],duplicated_color)
  }
  # Check if we have same colors side by side now
  side_by_side<-(col_vector[-length(col_vector)]==col_vector[-1])
  index_first_side_by_side=which(side_by_side)
}

col_vector