Say I have something like the following..
df <- data.frame (ID = c("2330", "2331", "2333", "2334", "2336", "2337", "4430", "4431", "4510", "4511"), length = c(8.4,6,3,9,3,4,1,7,4,2))
> df
ID length
1 2330 8.4
2 2331 6.0
3 2333 3.0
4 2334 9.0
5 2336 3.0
6 2337 4.0
7 4430 1.0
8 4431 7.0
9 4510 4.0
10 4511 2.0
IDs that are in a pair are /- 1 of each other. (2330, 2331), (2333, 2334), (2336, 2337), (4430, 4431), & (4510, 4511) are the pairs in my example. I would like to randomly sample 1 ID from each pair to get a dataframe that looks like the following...
> df
ID length
1 2330 8.4
2 2334 9.0
3 2336 3.0
4 4430 1.0
5 4510 4.0
How would I accomplish this with base R? Thank you.
CodePudding user response:
We may create a grouping column with gl
for every 2 adjacent elements and then use slice_sample
with n = 1
library(dplyr)
df %>%
group_by(grp = as.integer(gl(n(), 2, n()))) %>%
slice_sample(n = 1) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 5 × 2
ID length
<chr> <dbl>
1 2330 8.4
2 2333 3
3 2337 4
4 4430 1
5 4510 4
Or using base R
do.call(rbind, lapply(split(df, gl(nrow(df), 2, nrow(df)),
drop = TRUE), function(x) x[sample(nrow(x), 1),]))
-output
ID length
1 2330 8.4
2 2333 3.0
3 2337 4.0
4 4430 1.0
5 4510 4.0
Or with aggregate
in base R
aggregate(.~ grp, transform(df, grp = cumsum(c(TRUE,
diff(as.numeric(ID)) !=1))), FUN = sample, 1)[-1]
ID length
1 2331 8.4
2 2334 3
3 2337 3
4 4431 7
5 4510 2
Or with tapply
df[with(df, tapply(seq_along(ID), rep(seq_along(ID), each = 2,
length.out = nrow(df)), FUN = sample, 1)),]
ID length
1 2330 8.4
4 2334 9.0
5 2336 3.0
7 4430 1.0
10 4511 2.0