Say I have a population of 1000 patients with data of their sex. I'm being asked to draw a sample of size n that meets strictly that 65% of them must be male.
Some sample data (in here, the sex distribution is 50%-50%):
data <- data.frame(patient_id = 1:1000,
sex = append(rep("male", 500),
rep("female", 500))
)
Can't really see a way to solve this task using sample_n
or sample_frac
in dplyr
.
Result data should be something like this for n = 500, but with random patient_ids.
data.frame(patient_id = 1:500,
sex = append(rep("male", 325),
rep("female", 175))
)
Any insight is appreciated.
CodePudding user response:
We can use bind_rows
and filter them separately. First, let's set the values for the number of rows so that it can give flexibility if you want to change the percentage:
library(tidyverse)
number_of_sample <- 500
male_pct <- 0.65
number_of_male <- number_of_sample * male_pct
number_of_female <- number_of_sample - number_of_male
#For reproducibility setting the seed
set.seed(4)
data %>%
filter(sex=='male') %>%
sample_n(size = number_of_male) %>%
bind_rows(data %>%
filter(sex=='female') %>%
sample_n(size = number_of_female))-> sampled_data
Checking the numbers:
sampled_data %>%
group_by(sex) %>%
summarise(count=n())
# A tibble: 2 x 2
sex count
<chr> <int>
1 female 175
2 male 325
CodePudding user response:
Another tidyverse option.
library(dplyr)
n <- 150
df <- mutate(data, patient_id = sample(patient_id))
view <- filter(df, sex == 'male')[1:round(n*0.65),] %>%
bind_rows(filter(df, sex == 'female')[1:round(n*0.35),])
Counting the rows gives us:
count(view, sex)
# sex n
# 1 female 52
# 2 male 98
CodePudding user response:
This is an alternative solution using nesting of data in one pipeline. The proportions would need changed if you aren't using a 50/50 split.
library(tidyverse)
sampled_data = data %>%
group_by(sex) %>%
nest() %>%
ungroup() %>%
mutate(prop = c(0.65, 0.35)) %>%
mutate(samples = map2(data, prop, sample_frac)) %>%
select(-data, - prop) %>%
unnest(samples)
sampled_data %>% count(sex)
# A tibble: 2 × 2
sex n
<fct> <int>
1 female 175
2 male 325