This question is a follow up question from this question here
I have a population of 6 categories (stratum) and I want in each stratum to take the 10% as a sample. Doing so I take:
var = c(rep("A",10),rep("B",10),rep("C",3),rep("D",5),"E","F");var
value = rnorm(30)
dat = tibble(var,value);
pop=dat%>%group_by(var)
pop
singleallocperce = slice_sample(pop, prop=0.1);
singleallocperce
with result:
# A tibble: 2 x 2
# Groups: var [2]
var value
<chr> <dbl>
1 A -1.54
2 B -1.12
But I want even if in some stratum that the polupation inside them cannot reach the taken sample of 10% to take at least one observation.As has been answered previously the correct way is :
dat %>%
group_by(var) %>%
mutate(min = if_else(n() * 0.1 >= 1, n() * 0.1, 1),
random = sample(n())) %>%
filter(random <= min) |>
select(var, value)
But now additionally I want to make proportional allocation sampling (ie with weight proportional to the subpopulation of each stratum for example for A the weight will be : 10/30,for B: 10/30,for C:3/30,D:5/30 etc and then the weights w_i to be multiplied with the subpopulation of the stratum ) keeping the constrain of 1 observation if the subpopulation does not meet that requirement.
The proportional allocation scheme is as described below:
w = dat%>%group_by(var)%>%summarise(w= n()/nrow(.));w
dat%>%
group_by(var)%>%
summarise(Ni=n())%>%
left_join(w,by="var")%>%
mutate(sample_per_group=Ni*w)
with result
# A tibble: 6 × 4
var Ni w sample_per_group
<chr> <int> <dbl> <dbl>
1 A 10 0.333 3.33
2 B 10 0.333 3.33
3 C 3 0.1 0.3
4 D 5 0.167 0.833
5 E 1 0.0333 0.0333
6 F 1 0.0333 0.0333
the column sample_per_group contains the sample number per group (proportional to the subpopulation number).But I want to keep the constrain of sample_per_group <1 to give me one observation.
How can I do this using dplyr package ? Any help?
CodePudding user response:
Maybe this should work
library(dplyr)
dat %>%
group_by(var) %>%
mutate(w = n()/nrow(.), w = if_else(w <= 0.2, 1, w)) %>%
group_modify(~ .x %>%
slice_sample(prop = first(.$w))) %>%
slice(if(first(w) == 1) 1 else row_number()) %>%
ungroup
-output
# A tibble: 10 × 3
var value w
<chr> <dbl> <dbl>
1 A 0.811 0.333
2 A 0.363 0.333
3 A 0.355 0.333
4 B 0.366 0.333
5 B -0.979 0.333
6 B -0.809 0.333
7 C -0.536 1
8 D 1.06 1
9 E 0.832 1
10 F 0.937 1