Sample randomly within cutoff in tibble R-CodePudding

I have a tibble with 100 points in R, below:

preds <- tibble(x=1:100, y=seq(from=0.01,to=1,by=0.01))

And I want to randomly sample 20 observations with values less than 0.5. Currently, I can select the first 20 observations by:

number_of_likely_negatives<-20

likely_negatives <- preds %>% 
    arrange(y) %>% 
    slice(1:number_of_likely_negatives)

But how can I randomly select 20 observations with y values below 0.5?

CodePudding user response：

We may filter the 'y' values before sliceing

likely_negatives <- preds %>% 
    arrange(y) %>% 
    filter(y < 0.5) %>%
    slice(sample(seq(number_of_likely_negatives), 20, replace = FALSE))

We can also use slice_sample

preds %>% 
   arrange(y) %>%
   filter(y < 0.5) %>% 
   slice_sample(n = number_of_likely_negatives)

CodePudding user response：

You can use the following code:

library(dplyr)
sample_n(preds[preds$y < 0.5,], 20)

Output:

# A tibble: 20 × 2
       x     y
   <int> <dbl>
 1    42  0.42
 2    18  0.18
 3    44  0.44
 4    17  0.17
 5     7  0.07
 6    38  0.38
 7    23  0.23
 8    27  0.27
 9    20  0.2 
10     6  0.06
11    35  0.35
12    11  0.11
13     9  0.09
14    34  0.34
15    30  0.3 
16    29  0.29
17    39  0.39
18     3  0.03
19    13  0.13
20    47  0.47

CodePudding user response：

A direct answer:

preds %>% 
  slice(
    sample.int(which(y>threshold)[1], size = number_of_likely_negatives, replace = TRUE)
  )