I have one question. I am working with R, and I was given a code that looks like this.
mydata %>%
filter(group %in% sample(unique(group), size = 4)) %>%
group_by(group) %>%
slice_sample(n = 1) %>%
ungroup()
However, now I have encountered a problem with my data that I did not foresee. The problem is that I need to repeat the action of the second line of the code. And I don't know if it is correct to do it, and I need to know if a code that looks like this is correct.
rndmData <- mydata %>%
filter(group %in% sample(unique(group), size = 4)) %>%
group_by(group) %>%
filter(col_3 %in% sample(unique(col_3), size = 4)) %>%
slice_sample(n = 1) %>%
ungroup()
So the problem is that I need to sample my data and I need to output to be a unique value of the group column and a unique value of the col_3 column. The pasted code runs without an error, but I still don't know if it is correct.
For more clarity, my data looks like this...
group col_2 col_3 col_4
A p_m 12 21
A q_x 11 21
A i_z 13 22
B q_z 11 24
B p_x 14 25
B i_m 15 26
B q_m 17 28
C p_x 16 29
C i_z 12 23
C q_m 14 23
C q_x 13 25
D p_z 11 25
D i_z 15 26
D q_m 17 28
D q_x 14 29
E p_x 13 30
E i_m 15 26
E q_m 17 28
E p_x 16 29
F i_z 12 23
F q_x 13 25
F p_z 11 25
F i_z 15 26
G q_m 17 28
G q_z 11 24
G p_x 14 25
G i_m 15 26
H q_x 11 21
H i_z 13 22
H q_z 11 24
H p_x 13 30
So my desired result should be something like this ...
group col_2 col_3 col_4
A i_z 13 22
H q_z 11 24
D q_m 17 28
F i_z 15 26
In which the group letter is not repeated, and the value in col_3 is also not repeated.
The above code is correct?
UPDATE: the code is not correct. It does not give unique values for both columns.
The desire output is to obtain a result with unique groups and unique values in the col_3. The unique value of the col_3 should be a unique value no matter the group. So If the code selects
group col_2 col_3 col_4
A i_z 13 22
As its first value then, it cannot select from the H group the value that contains a 13 in col_3.
group col_2 col_3 col_4
A i_z 13 22
H i_z 13 22
Because the value in col_3 should not be repeated, therefore after taking the first value, the code should give me a second, third and fourth value that is a unique group and a unique col_3 value.
Something like this...
group col_2 col_3 col_4
A i_z 13 22
H q_z 11 24
D q_m 17 28
F i_z 15 26
In which the value of the group is not repeated, and the value in the col_3 is also not repeated.
CodePudding user response:
Here I show a solution.
set.seed(1234)
randmData <- mydata %>%
filter(group %in% sample(unique(group), size = 5)) %>%
group_by(col_3) %>%
slice_sample(n = 1) %>%
ungroup() %>%
group_by(group) %>%
slice_sample(n = 1) %>%
ungroup()
CodePudding user response:
I see that the values of col_3 are not repeated within the groups. Filtering for col_3 and nor for the group may do the trick.