Create a for loop to choose randomly years without replacement in R-CodePudding

I have a string g <- c(1979:2020) and I would like to create 8 random groups of 4 years each and two with 5 years without replacement. What I do is:

group1 <- sample(g, size = 4)
g1 <- g[!(g %in% group1)]
group2 <- sample(g1, size = 4)
g2 <- g[!(g %in% group2)] etc

Is there a smarter way to do that?

CodePudding user response：

You can create the entire vector with sample and size = length(g), and then create groups by splitting it every four elements for the first 8 groups (with gl) and then 5 times for the two remaining group.

sample(g, size = length(g)) |>
  split(c(gl(8, 4), gl(2, 5, labels = 9:10)))

Or, in a for loop, you can do:

g <- c(1979:2020)
l <- vector(10, mode = "list")
for (i in seq(10)){
  
  if(i %in% seq(8)){
    l[[i]] <- sample(g, size = 4)
    g <- g[!(g %in% unlist(l))]
  }
  
  else if (i %in% 9:10){
    l[[i]] <- sample(g, size = 5)
    g <- g[!(g %in% unlist(l))]
  }
}

output:

#> l
[[1]]
[1] 2010 1980 1983 2014

[[2]]
[1] 2019 2004 1990 1997

[[3]]
[1] 1981 1992 1979 2018

[[4]]
[1] 1986 2005 2008 2003

[[5]]
[1] 1987 1984 1996 1985

[[6]]
[1] 1982 1993 2020 2006

[[7]]
[1] 1995 1994 2017 1998

[[8]]
[1] 1989 1999 2012 1991

[[9]]
[1] 2013 2009 2016 2002 2000

[[10]]
[1] 2001 2015 1988 2007 2011

CodePudding user response：

A one liner, at the price of returning a tibble rather than a vector. If a vector is essential, that's an easy conversion.

library(dplyr)
library(tibble)

lapply(
  c(rep(4, 8), 5, 5), 
  function(x) slice_sample(tibble(year=c(1979:2020)), n=x)
)
[[1]]
# A tibble: 4 × 1
   year
  <int>
1  2004
2  1984
3  1991
4  1996

etc

CodePudding user response：

yv <- 1979:2020
yv_frame <- data.frame(
  year_vec = yv,
  randorder = sample(length(yv), length(yv), replace = FALSE)
)

num_first_groups <- 8
size_first_groups <- 4
num_second_groups <- 2
size_second_groups <- 5

# check
stopifnot(length(yv) == (num_first_groups * size_first_groups   
                           num_second_groups * size_second_groups))



library(tidyverse)

( cuts_for_first_regime <- c(1, (1:num_first_groups) * size_first_groups)) # in example 8 groups of 4
(cuts_to_use <- c( cuts_for_first_regime, tail( cuts_for_first_regime, 1)   
                    (1:num_second_groups) * size_second_groups)) # plus 2groups of 5

(calc_df <- arrange(tibble(yv_frame), randorder) |>
  mutate(g_ = cut(randorder,
    breaks = cuts_to_use, right = TRUE,
    include.lowest = TRUE
  ),
  nice_group_names = as.integer(g_)) |> arrange(nice_group_names))

(clean_calc_df <- select(calc_df,
                        year_vec,
                        nice_group_names))

#going further ; if you want a list of data.frames involving the years
(my_splits <- split(clean_calc_df,~nice_group_names,
                    drop=TRUE))