I have the following code that samples 1 row 5 times, 2 rows 5 times, 3 rows 5 times and so on.. After running the lapply and converting it to a dataframe to make comparisons I need a way to alter the ID variable to act as my groups. So rows 1:5 of "want" would be "group 1", rows 6:15 would be "group 2", 16:30 would be "group 3" and so on... These are the groupings because group one only has one replicate of each number in the ID column, group 2 has two replicates, group 3 has 3 replicates and so on.
Code
iris<- iris
select_rows <- 1:4
n_times <- 5
inds <- nrow(iris)
result <- lapply(select_rows, function(x)
replicate(n_times, iris[sample(inds, x), ], simplify = FALSE))
want<- bind_rows(result, .id = 'source')
View(want)
Thinking about running an ANOVA on each column for example, the ID column would not provide sufficient groupings of observations.
I suppose I could use a combo of ifelse
and mutate
to manually go through and assign the rows to certain groups, but I hope to avoid this as I will need to do this for several varying dataframes.
I also tried the following code to assign groups over a sequence, but realized it wouldn't work because the numbers of observations in each group are not the same:
final<- want %>% mutate(Group = rep(seq(1,ceiling(nrow(want)/5)),each = 5))
Any help would be appreciated.
CodePudding user response:
Use the times
argument to rep
to get five 1's, ten 2's, fifteen 3's, etc.
dat$id <- rep(1:3, times=1:3*5)
CodePudding user response:
Here's another option using findInterval
and cumsum
:
want$grp <- c(findInterval(seq(1:nrow(want)), cumsum(c(n_times * select_rows)) 1))
Output
source Sepal.Length Sepal.Width Petal.Length Petal.Width Species grp
1 1 5.4 3.7 1.5 0.2 setosa 0
2 2 4.8 3.1 1.6 0.2 setosa 0
3 3 6.1 2.8 4.7 1.2 versicolor 0
4 4 6.7 3.3 5.7 2.5 virginica 0
5 5 5.4 3.4 1.7 0.2 setosa 0
6 6 5.5 2.4 3.8 1.1 versicolor 1
7 6 6.0 2.2 5.0 1.5 virginica 1
8 7 5.0 3.5 1.6 0.6 setosa 1
9 7 5.6 2.5 3.9 1.1 versicolor 1
10 8 5.2 3.4 1.4 0.2 setosa 1
11 8 6.2 3.4 5.4 2.3 virginica 1
12 9 6.7 3.1 4.7 1.5 versicolor 1
13 9 5.1 3.3 1.7 0.5 setosa 1
14 10 7.7 3.8 6.7 2.2 virginica 1
15 10 6.0 3.4 4.5 1.6 versicolor 1
16 11 5.0 3.3 1.4 0.2 setosa 2
17 11 6.6 3.0 4.4 1.4 versicolor 2
18 11 5.6 2.5 3.9 1.1 versicolor 2
19 12 4.4 3.2 1.3 0.2 setosa 2
20 12 6.7 3.3 5.7 2.5 virginica 2
21 12 5.9 3.0 5.1 1.8 virginica 2
22 13 5.8 2.7 4.1 1.0 versicolor 2
23 13 5.4 3.4 1.5 0.4 setosa 2
24 13 5.5 2.4 3.8 1.1 versicolor 2
25 14 5.6 2.5 3.9 1.1 versicolor 2
26 14 6.0 2.2 5.0 1.5 virginica 2
27 14 5.7 3.8 1.7 0.3 setosa 2
28 15 6.2 3.4 5.4 2.3 virginica 2
29 15 6.5 3.2 5.1 2.0 virginica 2
30 15 5.8 2.7 4.1 1.0 versicolor 2
31 16 5.1 3.5 1.4 0.2 setosa 3
32 16 5.8 2.7 5.1 1.9 virginica 3
33 16 4.3 3.0 1.1 0.1 setosa 3
34 16 4.6 3.2 1.4 0.2 setosa 3
35 17 4.9 3.0 1.4 0.2 setosa 3
36 17 5.4 3.0 4.5 1.5 versicolor 3
37 17 6.7 3.1 4.7 1.5 versicolor 3
38 17 6.3 2.3 4.4 1.3 versicolor 3
39 18 4.7 3.2 1.3 0.2 setosa 3
40 18 6.5 2.8 4.6 1.5 versicolor 3
41 18 4.9 3.1 1.5 0.1 setosa 3
42 18 6.4 2.7 5.3 1.9 virginica 3
43 19 6.1 3.0 4.9 1.8 virginica 3
44 19 6.2 2.8 4.8 1.8 virginica 3
45 19 4.8 3.1 1.6 0.2 setosa 3
46 19 5.9 3.0 5.1 1.8 virginica 3
47 20 5.4 3.9 1.7 0.4 setosa 3
48 20 6.2 2.9 4.3 1.3 versicolor 3
49 20 6.4 2.7 5.3 1.9 virginica 3
50 20 6.3 3.4 5.6 2.4 virginica 3