Home > database >  R filter subset of groups using dplyr
R filter subset of groups using dplyr

Time:04-26

I am looking for a function that works with tidyverse / dplyr and allows me to select/sample a subset of groups from a data frame.

df <- data.frame(fruit = rep(c("apple", "pear", "orange"), 3),
                 n = 1:9)

df %>%
  group_by(fruit) %>%
  **group_sample(n = 2)** --> should return a subset of two groups: so either [apple   pear, apple   orange, pear   orange]

In this way if I want to test some code that runs over all different groups, I do not have to run the code over the entire dataset or manually select a few combinations of the grouping criteria and filter these.

CodePudding user response:

This will get all elements (rows) of 2 random groups:

library(tidyverse)

set.seed(1337)

df <- data.frame(
  fruit = rep(c("apple", "pear", "orange"), 3),
  n = 1:9
)


df %>%
  nest(-fruit) %>%
  sample_n(2) %>%
  unnest()
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(n)`?
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(data)`
#> # A tibble: 6 × 2
#>   fruit     n
#>   <chr> <int>
#> 1 pear      2
#> 2 pear      5
#> 3 pear      8
#> 4 apple     1
#> 5 apple     4
#> 6 apple     7

Created on 2022-04-25 by the reprex package (v2.0.1)

CodePudding user response:

You could do

df %>% 
  filter(fruit %in% sample(unique(fruit), 2)) %>%
  group_by(fruit)
#> # A tibble: 6 x 2
#> # Groups:   fruit [2]
#>   fruit      n
#>   <chr>  <int>
#> 1 pear       2
#> 2 orange     3
#> 3 pear       5
#> 4 orange     6
#> 5 pear       8
#> 6 orange     9

Created on 2022-04-25 by the reprex package (v2.0.1)

CodePudding user response:

You could do this. Sample one from each group, then sample 2 of those.

library(tidyverse)

df <- data.frame(fruit = rep(c("apple", "pear", "orange"), 3),
                 n = 1:9)

df %>%
  group_by(fruit) |> 
  slice_sample(n = 1) |> 
  ungroup() |> 
  slice_head(n = 2)
#> # A tibble: 2 × 2
#>   fruit      n
#>   <chr>  <int>
#> 1 apple      4
#> 2 orange     9

Created on 2022-04-25 by the reprex package (v2.0.1)

  • Related