Home > Software engineering >  Randomly sampling 1 ID from each pair in column
Randomly sampling 1 ID from each pair in column

Time:02-03

Say I have something like the following..

df <- data.frame (ID  = c("2330", "2331", "2333", "2334", "2336", "2337", "4430", "4431", "4510", "4511"), length = c(8.4,6,3,9,3,4,1,7,4,2))

> df
     ID length
1  2330    8.4
2  2331    6.0
3  2333    3.0
4  2334    9.0
5  2336    3.0
6  2337    4.0
7  4430    1.0
8  4431    7.0
9  4510    4.0
10 4511    2.0

IDs that are in a pair are /- 1 of each other. (2330, 2331), (2333, 2334), (2336, 2337), (4430, 4431), & (4510, 4511) are the pairs in my example. I would like to randomly sample 1 ID from each pair to get a dataframe that looks like the following...

> df
     ID length
1  2330    8.4
2  2334    9.0
3  2336    3.0
4  4430    1.0
5  4510    4.0

How would I accomplish this with base R? Thank you.

CodePudding user response:

We may create a grouping column with gl for every 2 adjacent elements and then use slice_sample with n = 1

library(dplyr)
df %>% 
  group_by(grp = as.integer(gl(n(), 2, n()))) %>% 
  slice_sample(n = 1) %>%
  ungroup %>%
  select(-grp)

-output

# A tibble: 5 × 2
  ID    length
  <chr>  <dbl>
1 2330     8.4
2 2333     3  
3 2337     4  
4 4430     1  
5 4510     4  

Or using base R

do.call(rbind, lapply(split(df, gl(nrow(df), 2, nrow(df)),
   drop = TRUE), function(x) x[sample(nrow(x), 1),]))

-output

    ID length
1 2330    8.4
2 2333    3.0
3 2337    4.0
4 4430    1.0
5 4510    4.0

Or with aggregate in base R

aggregate(.~ grp, transform(df, grp = cumsum(c(TRUE, 
     diff(as.numeric(ID)) !=1))), FUN = sample, 1)[-1]
    ID length
1 2331    8.4
2 2334      3
3 2337      3
4 4431      7
5 4510      2

Or with tapply

df[with(df, tapply(seq_along(ID), rep(seq_along(ID), each = 2, 
    length.out = nrow(df)), FUN = sample, 1)),]
     ID length
1  2330    8.4
4  2334    9.0
5  2336    3.0
7  4430    1.0
10 4511    2.0
  • Related