I have a string g <- c(1979:2020) and I would like to create 8 random groups of 4 years each and two with 5 years without replacement. What I do is:
group1 <- sample(g, size = 4)
g1 <- g[!(g %in% group1)]
group2 <- sample(g1, size = 4)
g2 <- g[!(g %in% group2)] etc
Is there a smarter way to do that?
CodePudding user response:
You can create the entire vector with sample
and size = length(g)
, and then create groups by splitting it every four elements for the first 8 groups (with gl
) and then 5 times for the two remaining group.
sample(g, size = length(g)) |>
split(c(gl(8, 4), gl(2, 5, labels = 9:10)))
Or, in a for loop, you can do:
g <- c(1979:2020)
l <- vector(10, mode = "list")
for (i in seq(10)){
if(i %in% seq(8)){
l[[i]] <- sample(g, size = 4)
g <- g[!(g %in% unlist(l))]
}
else if (i %in% 9:10){
l[[i]] <- sample(g, size = 5)
g <- g[!(g %in% unlist(l))]
}
}
output:
#> l
[[1]]
[1] 2010 1980 1983 2014
[[2]]
[1] 2019 2004 1990 1997
[[3]]
[1] 1981 1992 1979 2018
[[4]]
[1] 1986 2005 2008 2003
[[5]]
[1] 1987 1984 1996 1985
[[6]]
[1] 1982 1993 2020 2006
[[7]]
[1] 1995 1994 2017 1998
[[8]]
[1] 1989 1999 2012 1991
[[9]]
[1] 2013 2009 2016 2002 2000
[[10]]
[1] 2001 2015 1988 2007 2011
CodePudding user response:
A one liner, at the price of returning a tibble rather than a vector. If a vector is essential, that's an easy conversion.
library(dplyr)
library(tibble)
lapply(
c(rep(4, 8), 5, 5),
function(x) slice_sample(tibble(year=c(1979:2020)), n=x)
)
[[1]]
# A tibble: 4 × 1
year
<int>
1 2004
2 1984
3 1991
4 1996
etc
CodePudding user response:
yv <- 1979:2020
yv_frame <- data.frame(
year_vec = yv,
randorder = sample(length(yv), length(yv), replace = FALSE)
)
num_first_groups <- 8
size_first_groups <- 4
num_second_groups <- 2
size_second_groups <- 5
# check
stopifnot(length(yv) == (num_first_groups * size_first_groups
num_second_groups * size_second_groups))
library(tidyverse)
( cuts_for_first_regime <- c(1, (1:num_first_groups) * size_first_groups)) # in example 8 groups of 4
(cuts_to_use <- c( cuts_for_first_regime, tail( cuts_for_first_regime, 1)
(1:num_second_groups) * size_second_groups)) # plus 2groups of 5
(calc_df <- arrange(tibble(yv_frame), randorder) |>
mutate(g_ = cut(randorder,
breaks = cuts_to_use, right = TRUE,
include.lowest = TRUE
),
nice_group_names = as.integer(g_)) |> arrange(nice_group_names))
(clean_calc_df <- select(calc_df,
year_vec,
nice_group_names))
#going further ; if you want a list of data.frames involving the years
(my_splits <- split(clean_calc_df,~nice_group_names,
drop=TRUE))