I am working with R, and my data looks similar to this...
group col_2 col_3 col_4
A p_m 12 21
A q_x 11 21
A i_z 13 22
B q_z 11 24
B p_x 14 25
B i_m 15 26
B q_m 17 28
C p_x 16 29
C i_z 12 23
C q_m 14 23
C q_x 13 25
D p_z 11 25
D i_z 15 26
D q_m 17 28
D q_x 14 29
E p_x 13 30
E i_m 15 26
E q_m 17 28
E p_x 16 29
F i_z 12 23
F q_x 13 25
F p_z 11 25
F i_z 15 26
G q_m 17 28
G q_z 11 24
G p_x 14 25
G i_m 15 26
H q_x 11 21
H i_z 13 22
H q_z 11 24
H p_x 13 30
I need to randomly select 4 rows based on the group column. In other words, my output should not contain two observations that belong to the same group.
So I can get a result that looks like this ...
group col_2 col_3 col_4
A i_z 13 22
H i_z 13 22
D q_m 17 28
F p_z 11 25
I have tried things like this.
set.seed(1234)
rndmData <- mydata %>%
sample_n(5)
set.seed(1234)
rndmData <- mydata %>%
sample_n(distinct(group), 5)
set.seed(1234)
rndmData <- mydata %>%
sample_n(unique(group), 5)
However, none of them led me to the desired result.
Any help would be great.
CodePudding user response:
Sample 4 groups, then sample one row from within each group:
mydata %>%
filter(group %in% sample(unique(group), size = 4)) %>%
group_by(group) %>%
slice_sample(n = 1) %>%
ungroup()