Home > Net >  Sampling the equal amount of random values considering a code within a column in R
Sampling the equal amount of random values considering a code within a column in R

Time:09-21

I am working with R and I have a dataset that looks like this...

group  col_2  col_3   col_4
A      p_m     12      21
A      q_x     11      21
A      i_z     13      22
A      q_z     11      24
A      p_x     14      25
A      i_m     15      26
A      q_m     17      28
A      p_x     16      29
A      i_z     12      23
A      q_m     14      23
A      q_x     13      25 
A      p_z     11      25
A      i_z     15      26
A      q_m     17      28
A      q_x     14      29
A      p_x     13      30
A      i_m     15      26
A      q_m     17      28
A      p_x     16      29
A      i_z     12      23
A      q_x     13      25 
A      p_z     11      25
A      i_z     15      26
A      q_m     17      28
A      q_z     11      24
A      p_x     14      25
A      i_m     15      26
A      q_x     11      21
A      i_z     13      22
A      q_z     11      24
A      p_x     13      30
A      i_m     15      26
A      q_m     17      28
A      p_x     16      29
A      i_z     12      23

Ok, so I need to randomly select 12 rows based on col_2. I need 6 random rows that start with a "p" in col_2 and 6 random rows that start with a "q" in col_2. I have tried different things with sample_n, but I don't find a way to specifically select 6 and 6 from the different codes.

Any help would be great.

CodePudding user response:

First make the data reproducible with dput():

dta <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A"), col_2 = c("p_m", "q_x", "i_z", "q_z", "p_x", "i_m", "q_m", 
"p_x", "i_z", "q_m", "q_x", "p_z", "i_z", "q_m", "q_x", "p_x", 
"i_m", "q_m", "p_x", "i_z", "q_x", "p_z", "i_z", "q_m", "q_z", 
"p_x", "i_m", "q_x", "i_z", "q_z", "p_x", "i_m", "q_m", "p_x", 
"i_z"), col_3 = c(12L, 11L, 13L, 11L, 14L, 15L, 17L, 16L, 12L, 
14L, 13L, 11L, 15L, 17L, 14L, 13L, 15L, 17L, 16L, 12L, 13L, 11L, 
15L, 17L, 11L, 14L, 15L, 11L, 13L, 11L, 13L, 15L, 17L, 16L, 12L
), col_4 = c(21L, 21L, 22L, 24L, 25L, 26L, 28L, 29L, 23L, 23L, 
25L, 25L, 26L, 28L, 29L, 30L, 26L, 28L, 29L, 23L, 25L, 25L, 26L, 
28L, 24L, 25L, 26L, 21L, 22L, 24L, 30L, 26L, 28L, 29L, 23L)), class = "data.frame", row.names = c(NA, 
-35L))

Now identify the values beginning with "p" and "q" and draw the samples:

psam <- sample(which(strtrim(dta$col_2, 1) == "p"), 6)
dta[psam, ]
#    group col_2 col_3 col_4
# 26     A   p_x    14    25
# 22     A   p_z    11    25
# 16     A   p_x    13    30
# 12     A   p_z    11    25
# 5      A   p_x    14    25
# 19     A   p_x    16    29
qsam <- sample(which(strtrim(dta$col_2, 1) == "q"), 6)
dta[qsam, ]
#    group col_2 col_3 col_4
# 10     A   q_m    14    23
# 11     A   q_x    13    25
# 21     A   q_x    13    25
# 33     A   q_m    17    28
# 14     A   q_m    17    28
# 30     A   q_z    11    24
  •  Tags:  
  • r
  • Related