I am working on a project where products in production have a defect, but in very rare cases. For example 1/1,000,000 products have a defect.
How could I generate data, in R, Python, or Excel, that would represent samples from this distribution ?
CodePudding user response:
In R you could do: sample(c(1, rep(0, (1e6)-1)), size = 10)
You can adjust the sizing parameter accordingly. With size=10
you'll get 10 samples: [1] 0 0 0 0 0 0 0 0 0 0
It'll take a while before you see a 1
with this probability of 1/1e6
.
CodePudding user response:
If you use the tidyverse
in R
you can do this:
library(tidyverse)
set.seed(1)
tibble(
group = c(1, 1, 2, 2, 2),
id = seq_len(5)
) %>%
group_by(group) %>%
add_count(name = "group_size") %>%
slice_sample(n = min(.$group_size)) %>%
select(-group_size)
#> # A tibble: 4 × 2
#> # Groups: group [2]
#> group id
#> <dbl> <int>
#> 1 1 1
#> 2 1 2
#> 3 2 5
#> 4 2 3
Created on 2022-07-05 by the reprex package (v2.0.1)
You use the size of the smaller group as n
and draw a sample from the other group with the same size. Instead of min(.$group_size)
you can also set number, if you prefer.