Randomly flagging records within a group using dplyr-CodePudding

I have a table and would like to randomly flag three records from each group with a 1 and all other records with a 0.

I know that I can accomplish this using the following code, but this seems clunky and inefficient. Is there any other way I can accomplish the same thing?

library(tidyverse)
dat <- data.frame(row_id = 1:10,
           grp = c(rep("a", 5), rep("b", 5)))

dat_sample <- dat %>% 
  group_by(grp) %>% 
  sample_n(3) %>% 
  mutate(val = 1)

dat %>% 
  left_join(dat_sample, by = c("row_id", "grp")) %>% 
  mutate(val = coalesce(val, 0))

CodePudding user response：

An option is with mutate instead of a join - i.e. grouped by 'grp', sample the row_number() and create a logical vector, which is coerced to binary with as.integer or

library(dplyr)
dat %>%
   group_by(grp) %>% 
   mutate(val =  (row_number() %in% sample(row_number(), 3))) %>%
   ungroup

Or perhaps

dat %>%
    group_by(grp) %>%
    mutate(val = rbinom(n(), 1, 0.3)) %>%
    ungroup

CodePudding user response：

Here is a possible data.table solution, which also makes use of sample:

library(data.table)
dt <- as.data.table(dat)

dt[, C3 :=  (1:.N %in% sample(.N, min(.N, 3))), by = grp]

Output

   row_id grp C3
 1:      1   a  1
 2:      2   a  0
 3:      3   a  1
 4:      4   a  0
 5:      5   a  1
 6:      6   b  0
 7:      7   b  1
 8:      8   b  1
 9:      9   b  0
10:     10   b  1