I am looking for a function that works with tidyverse
/ dplyr
and allows me to select/sample a subset of groups from a data frame.
df <- data.frame(fruit = rep(c("apple", "pear", "orange"), 3),
n = 1:9)
df %>%
group_by(fruit) %>%
**group_sample(n = 2)** --> should return a subset of two groups: so either [apple pear, apple orange, pear orange]
In this way if I want to test some code that runs over all different groups, I do not have to run the code over the entire dataset or manually select a few combinations of the grouping criteria and filter these.
CodePudding user response:
This will get all elements (rows) of 2 random groups:
library(tidyverse)
set.seed(1337)
df <- data.frame(
fruit = rep(c("apple", "pear", "orange"), 3),
n = 1:9
)
df %>%
nest(-fruit) %>%
sample_n(2) %>%
unnest()
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(n)`?
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(data)`
#> # A tibble: 6 × 2
#> fruit n
#> <chr> <int>
#> 1 pear 2
#> 2 pear 5
#> 3 pear 8
#> 4 apple 1
#> 5 apple 4
#> 6 apple 7
Created on 2022-04-25 by the reprex package (v2.0.1)
CodePudding user response:
You could do
df %>%
filter(fruit %in% sample(unique(fruit), 2)) %>%
group_by(fruit)
#> # A tibble: 6 x 2
#> # Groups: fruit [2]
#> fruit n
#> <chr> <int>
#> 1 pear 2
#> 2 orange 3
#> 3 pear 5
#> 4 orange 6
#> 5 pear 8
#> 6 orange 9
Created on 2022-04-25 by the reprex package (v2.0.1)
CodePudding user response:
You could do this. Sample one from each group, then sample 2 of those.
library(tidyverse)
df <- data.frame(fruit = rep(c("apple", "pear", "orange"), 3),
n = 1:9)
df %>%
group_by(fruit) |>
slice_sample(n = 1) |>
ungroup() |>
slice_head(n = 2)
#> # A tibble: 2 × 2
#> fruit n
#> <chr> <int>
#> 1 apple 4
#> 2 orange 9
Created on 2022-04-25 by the reprex package (v2.0.1)