I have a tibble with 100 points in R, below:
preds <- tibble(x=1:100, y=seq(from=0.01,to=1,by=0.01))
And I want to randomly sample 20 observations with values less than 0.5. Currently, I can select the first 20 observations by:
number_of_likely_negatives<-20
likely_negatives <- preds %>%
arrange(y) %>%
slice(1:number_of_likely_negatives)
But how can I randomly select 20 observations with y values below 0.5?
CodePudding user response:
We may filter
the 'y' values before slice
ing
likely_negatives <- preds %>%
arrange(y) %>%
filter(y < 0.5) %>%
slice(sample(seq(number_of_likely_negatives), 20, replace = FALSE))
We can also use slice_sample
preds %>%
arrange(y) %>%
filter(y < 0.5) %>%
slice_sample(n = number_of_likely_negatives)
CodePudding user response:
You can use the following code:
library(dplyr)
sample_n(preds[preds$y < 0.5,], 20)
Output:
# A tibble: 20 × 2
x y
<int> <dbl>
1 42 0.42
2 18 0.18
3 44 0.44
4 17 0.17
5 7 0.07
6 38 0.38
7 23 0.23
8 27 0.27
9 20 0.2
10 6 0.06
11 35 0.35
12 11 0.11
13 9 0.09
14 34 0.34
15 30 0.3
16 29 0.29
17 39 0.39
18 3 0.03
19 13 0.13
20 47 0.47
CodePudding user response:
A direct answer:
preds %>%
slice(
sample.int(which(y>threshold)[1], size = number_of_likely_negatives, replace = TRUE)
)