Home > Mobile >  Randomly select strings based on multiple criteria in R
Randomly select strings based on multiple criteria in R

Time:04-10

I'm trying to select strings based on multiple criteria but so far no success. My vector contains the following strings (a total of 48 strings): (1_A, 1_B, 1_C, 1_D, 2_A, 2_B, 2_C, 2_D... 12_A, 12_B, 12_C, 12_D)

I need to randomly select 12 strings. The criteria are:

  • I need one string containing each number
  • I need exactly three strings that contains each letter.

I need the final output to be something like: 1_A, 2_A, 3_A, 4_B, 5_B, 6_B, 7_C, 8_C, 9_C, 10_D, 11_D, 12_D.

Any help will appreciated.

All the best, Angelica

CodePudding user response:

The trick here is not to use your vector at all, but to create the sample strings from their components, which are randomly chosen according to your criteria.

sample(paste(sample(12), rep(LETTERS[1:4], 3), sep = '_'))
#> [1] "12_D" "8_C"  "7_B"  "1_B"  "6_D"  "5_A"  "4_B"  "10_A" "2_C"  "3_A"  "11_D" "9_C" 

This will give a different result each time.

Note that all 4 letters are always represented exactly 3 times since we use rep(LETTERS[1:4], 3), all numbers 1 to 12 are present exactly once but in a random order since we use sample(12), and the final result is shuffled so that the order of the letters and the order of the numbers is not predictable.

If you want the result to give you the indices of your original vector where the samples are from, then it's easy to do that using match. We can recreate your vector by doing:

vec <- paste(rep(1:12, each = 4), rep(LETTERS[1:4], 12), sep = "_")

vec
#>  [1] "1_A"  "1_B"  "1_C"  "1_D"  "2_A"  "2_B"  "2_C"  "2_D"  "3_A"  "3_B" 
#> [11] "3_C"  "3_D"  "4_A"  "4_B"  "4_C"  "4_D"  "5_A"  "5_B"  "5_C"  "5_D" 
#> [21] "6_A"  "6_B"  "6_C"  "6_D"  "7_A"  "7_B"  "7_C"  "7_D"  "8_A"  "8_B" 
#> [31] "8_C"  "8_D"  "9_A"  "9_B"  "9_C"  "9_D"  "10_A" "10_B" "10_C" "10_D"
#> [41] "11_A" "11_B" "11_C" "11_D" "12_A" "12_B" "12_C" "12_D"

And to find the location of the random samples we can do:

samp <- match(sample(paste(sample(12), rep(LETTERS[1:4], 3), sep = '_')), vec)

samp
#>  [1] 30 26 37 43 46 20  8  3 33 24 15  9

So that, for example, you can retrieve an appropriate sample from your vector with:

vec[samp]
#>  [1] "8_B"  "7_B"  "10_A" "11_C" "12_B" "5_D"  "2_D"  "1_C"  "9_A"  "6_D" 
#> [11] "4_C"  "3_A"

Created on 2022-04-10 by the reprex package (v2.0.1)

CodePudding user response:

Allan's solution was exactly what I needed.

For the second part of my project, I need to select once again 12 more strings. I will create then another vector as instructed by Allan. However, this second vector cannot contain any of the elements that are in the first vector.

For example, if 8_B is in the first vector, it can't be in the second one.

Any help will be more than appreciated :)

All the Best, Angelica

  • Related