Home > Blockchain >  Sample a proportion of each group with propotional allocation, BUT with a minimum constraint (using
Sample a proportion of each group with propotional allocation, BUT with a minimum constraint (using

Time:06-10

This question is a follow up question from this question here

I have a population of 6 categories (stratum) and I want in each stratum to take the 10% as a sample. Doing so I take:

var = c(rep("A",10),rep("B",10),rep("C",3),rep("D",5),"E","F");var
value = rnorm(30)
dat = tibble(var,value);
pop=dat%>%group_by(var)
pop
singleallocperce = slice_sample(pop, prop=0.1);
singleallocperce

with result:

# A tibble: 2 x 2
# Groups:   var [2]
  var   value
  <chr> <dbl>
1 A     -1.54
2 B     -1.12

But I want even if in some stratum that the polupation inside them cannot reach the taken sample of 10% to take at least one observation.As has been answered previously the correct way is :

dat %>%
  group_by(var) %>%
  mutate(min = if_else(n() * 0.1 >= 1, n() * 0.1, 1),
         random = sample(n())) %>%
  filter(random <= min) |> 
  select(var, value)

But now additionally I want to make proportional allocation sampling (ie with weight proportional to the subpopulation of each stratum for example for A the weight will be : 10/30,for B: 10/30,for C:3/30,D:5/30 etc and then the weights w_i to be multiplied with the subpopulation of the stratum ) keeping the constrain of 1 observation if the subpopulation does not meet that requirement.

The proportional allocation scheme is as described below:

w = dat%>%group_by(var)%>%summarise(w= n()/nrow(.));w         
dat%>%
  group_by(var)%>%
  summarise(Ni=n())%>%
  left_join(w,by="var")%>%
  mutate(sample_per_group=Ni*w)

with result

# A tibble: 6 × 4
  var      Ni      w sample_per_group
  <chr> <int>  <dbl>            <dbl>
1 A        10 0.333            3.33  
2 B        10 0.333            3.33  
3 C         3 0.1              0.3   
4 D         5 0.167            0.833 
5 E         1 0.0333           0.0333
6 F         1 0.0333           0.0333

the column sample_per_group contains the sample number per group (proportional to the subpopulation number).But I want to keep the constrain of sample_per_group <1 to give me one observation.

How can I do this using dplyr package ? Any help?

CodePudding user response:

Maybe this should work

library(dplyr)
dat %>% 
  group_by(var) %>%
  mutate(w = n()/nrow(.), w = if_else(w <= 0.2, 1, w)) %>%
  group_modify(~ .x %>%
         slice_sample(prop = first(.$w))) %>% 
  slice(if(first(w) == 1) 1 else row_number()) %>%
  ungroup

-output

# A tibble: 10 × 3
   var    value     w
   <chr>  <dbl> <dbl>
 1 A      0.811 0.333
 2 A      0.363 0.333
 3 A      0.355 0.333
 4 B      0.366 0.333
 5 B     -0.979 0.333
 6 B     -0.809 0.333
 7 C     -0.536 1    
 8 D      1.06  1    
 9 E      0.832 1    
10 F      0.937 1    
  • Related