I have a table and would like to randomly flag three records from each group with a 1
and all other records with a 0
.
I know that I can accomplish this using the following code, but this seems clunky and inefficient. Is there any other way I can accomplish the same thing?
library(tidyverse)
dat <- data.frame(row_id = 1:10,
grp = c(rep("a", 5), rep("b", 5)))
dat_sample <- dat %>%
group_by(grp) %>%
sample_n(3) %>%
mutate(val = 1)
dat %>%
left_join(dat_sample, by = c("row_id", "grp")) %>%
mutate(val = coalesce(val, 0))
CodePudding user response:
An option is with mutate
instead of a join - i.e. grouped by 'grp', sample
the row_number()
and create a logical vector, which is coerced to binary with as.integer
or
library(dplyr)
dat %>%
group_by(grp) %>%
mutate(val = (row_number() %in% sample(row_number(), 3))) %>%
ungroup
Or perhaps
dat %>%
group_by(grp) %>%
mutate(val = rbinom(n(), 1, 0.3)) %>%
ungroup
CodePudding user response:
Here is a possible data.table
solution, which also makes use of sample
:
library(data.table)
dt <- as.data.table(dat)
dt[, C3 := (1:.N %in% sample(.N, min(.N, 3))), by = grp]
Output
row_id grp C3
1: 1 a 1
2: 2 a 0
3: 3 a 1
4: 4 a 0
5: 5 a 1
6: 6 b 0
7: 7 b 1
8: 8 b 1
9: 9 b 0
10: 10 b 1