Home > database >  Filtering for rows with unique values in two different columns in R. Verifying if a code is correct
Filtering for rows with unique values in two different columns in R. Verifying if a code is correct

Time:10-06

I have one question. I am working with R, and I was given a code that looks like this.

mydata %>%  
filter(group %in% sample(unique(group), size = 4)) %>%  
group_by(group) %>%  
slice_sample(n = 1) %>%  
ungroup()

However, now I have encountered a problem with my data that I did not foresee. The problem is that I need to repeat the action of the second line of the code. And I don't know if it is correct to do it, and I need to know if a code that looks like this is correct. 

rndmData <- mydata %>% 
filter(group %in% sample(unique(group), size = 4)) %>% 
group_by(group) %>% 
filter(col_3 %in% sample(unique(col_3), size = 4)) %>% 
slice_sample(n = 1) %>% 
ungroup()

So the problem is that I need to sample my data and I need to output to be a unique value of the group column and a unique value of the col_3 column. The pasted code runs without an error, but I still don't know if it is correct. 

For more clarity, my data looks like this... 

group  col_2  col_3   col_4
A      p_m     12      21
A      q_x     11      21
A      i_z     13      22
B      q_z     11      24
B      p_x     14      25
B      i_m     15      26
B      q_m     17      28
C      p_x     16      29
C      i_z     12      23
C      q_m     14      23
C      q_x     13      25 
D      p_z     11      25
D      i_z     15      26
D      q_m     17      28
D      q_x     14      29
E      p_x     13      30
E      i_m     15      26
E      q_m     17      28
E      p_x     16      29
F      i_z     12      23
F      q_x     13      25 
F      p_z     11      25
F      i_z     15      26
G      q_m     17      28
G      q_z     11      24
G      p_x     14      25
G      i_m     15      26
H      q_x     11      21
H      i_z     13      22
H      q_z     11      24
H      p_x     13      30

So my desired result should be something like this ...

group  col_2  col_3   col_4
A      i_z     13      22
H      q_z     11      24
D      q_m     17      28
F      i_z     15      26

In which the group letter is not repeated, and the value in col_3 is also not repeated. 

The above code is correct? 

UPDATE: the code is not correct. It does not give unique values for both columns.

The desire output is to obtain a result with unique groups and unique values in the col_3. The unique value of the col_3 should be a unique value no matter the group. So If the code selects

group  col_2  col_3   col_4
A      i_z     13      22

As its first value then, it cannot select from the H group the value that contains a 13 in col_3.

group  col_2  col_3   col_4
A      i_z     13      22
H      i_z     13      22

Because the value in col_3 should not be repeated, therefore after taking the first value, the code should give me a second, third and fourth value that is a unique group and a unique col_3 value.

Something like this...

group  col_2  col_3   col_4
A      i_z     13      22
H      q_z     11      24
D      q_m     17      28
F      i_z     15      26

In which the value of the group is not repeated, and the value in the col_3 is also not repeated.

CodePudding user response:

Here I show a solution.

set.seed(1234)
randmData <- mydata %>% 
  filter(group %in% sample(unique(group), size = 5)) %>% 
  group_by(col_3) %>% 
  slice_sample(n = 1) %>% 
  ungroup() %>% 
  group_by(group) %>% 
  slice_sample(n = 1) %>% 
  ungroup()

CodePudding user response:

I see that the values of col_3 are not repeated within the groups. Filtering for col_3 and nor for the group may do the trick.

  • Related