Home > Enterprise >  For every 3 rows, take a random row from the 3 out into a test data frame. (R)
For every 3 rows, take a random row from the 3 out into a test data frame. (R)

Time:08-18

I have a dataframe(df) set up in a way that each 3 rows is a biological triplicate.

Firstly, for every 3 rows, I'd like to randomly select 1 row out of the 3, take it out of df and put it in df_test.

CodePudding user response:

library(dplyr)
df_test <- df %>%
  group_by(grp = (row_number()-1) %/% 3) %>%
  slice_sample(n = 1) %>%
  ungroup()

CodePudding user response:

You should be able to sample all at once. If each group is a block of n rows, sample randomly an offset of 0:(n-1) from the start of each block, and add it to the start of each block - seq(1, nrow(df), n).

s <- seq(1, nrow(df), n)
df[sample(0:(n-1), length(s))   s,]

Try it with 1000 runs and the distribution of rows selected seems pretty uniform:

set.seed(1)
df <- data.frame(matrix(1:18, ncol=2))
s <- seq(1, nrow(df), n)
table(replicate(1000, sample(0:(n-1), length(s))   s))

#  1   2   3   4   5   6   7   8   9 
#341 329 330 325 344 331 334 327 339 
  •  Tags:  
  • r
  • Related