I am working with R and I have a dataset that looks like this...
group col_2 col_3 col_4
A p_m 12 21
A q_x 11 21
A i_z 13 22
A q_z 11 24
A p_x 14 25
A i_m 15 26
A q_m 17 28
A p_x 16 29
A i_z 12 23
A q_m 14 23
A q_x 13 25
A p_z 11 25
A i_z 15 26
A q_m 17 28
A q_x 14 29
A p_x 13 30
A i_m 15 26
A q_m 17 28
A p_x 16 29
A i_z 12 23
A q_x 13 25
A p_z 11 25
A i_z 15 26
A q_m 17 28
A q_z 11 24
A p_x 14 25
A i_m 15 26
A q_x 11 21
A i_z 13 22
A q_z 11 24
A p_x 13 30
A i_m 15 26
A q_m 17 28
A p_x 16 29
A i_z 12 23
Ok, so I need to randomly select 12 rows based on col_2. I need 6 random rows that start with a "p" in col_2 and 6 random rows that start with a "q" in col_2. I have tried different things with sample_n, but I don't find a way to specifically select 6 and 6 from the different codes.
Any help would be great.
CodePudding user response:
First make the data reproducible with dput()
:
dta <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A"), col_2 = c("p_m", "q_x", "i_z", "q_z", "p_x", "i_m", "q_m",
"p_x", "i_z", "q_m", "q_x", "p_z", "i_z", "q_m", "q_x", "p_x",
"i_m", "q_m", "p_x", "i_z", "q_x", "p_z", "i_z", "q_m", "q_z",
"p_x", "i_m", "q_x", "i_z", "q_z", "p_x", "i_m", "q_m", "p_x",
"i_z"), col_3 = c(12L, 11L, 13L, 11L, 14L, 15L, 17L, 16L, 12L,
14L, 13L, 11L, 15L, 17L, 14L, 13L, 15L, 17L, 16L, 12L, 13L, 11L,
15L, 17L, 11L, 14L, 15L, 11L, 13L, 11L, 13L, 15L, 17L, 16L, 12L
), col_4 = c(21L, 21L, 22L, 24L, 25L, 26L, 28L, 29L, 23L, 23L,
25L, 25L, 26L, 28L, 29L, 30L, 26L, 28L, 29L, 23L, 25L, 25L, 26L,
28L, 24L, 25L, 26L, 21L, 22L, 24L, 30L, 26L, 28L, 29L, 23L)), class = "data.frame", row.names = c(NA,
-35L))
Now identify the values beginning with "p" and "q" and draw the samples:
psam <- sample(which(strtrim(dta$col_2, 1) == "p"), 6)
dta[psam, ]
# group col_2 col_3 col_4
# 26 A p_x 14 25
# 22 A p_z 11 25
# 16 A p_x 13 30
# 12 A p_z 11 25
# 5 A p_x 14 25
# 19 A p_x 16 29
qsam <- sample(which(strtrim(dta$col_2, 1) == "q"), 6)
dta[qsam, ]
# group col_2 col_3 col_4
# 10 A q_m 14 23
# 11 A q_x 13 25
# 21 A q_x 13 25
# 33 A q_m 17 28
# 14 A q_m 17 28
# 30 A q_z 11 24