I'm trying to make datasets for each class by using nest, and after doing so, I need to perform some computations that require me to use distinct() to avoid duplicates.
However, when I try, R seems to ignore nest and just proceeds on. As a result, I just have the overall results and just one dataframe. How do I get this to work and why is it failing?
Note: I know that, for the below simple example, I don't need to use nest and could use group_by(), but I need nest() for my actual data and am curious why it isn't working.
#Set up and sample data
library(tidyverse)
test_data <- tibble(id = c(1, 1, 2, 2, 2, 3, 3, 3),
class = c("h", "h", "m", "h", "s", "m", "h", "h"),
gender = c("m", "m", "f", "f", "f", "m", "m", "m"))
#Runs but isn't correct
nested_test <- test_data %>%
nest(data = class) %>%
distinct(id, gender) %>%
count(gender)
nested_test
CodePudding user response:
Not 100% sure I understand your goal, but this code nests all columns except class
into a separate dataframe for each class
. Note you first specify the columns wanted, then the variable(s) to group .by
. It then maps over each dataframe, applying distinct()
and count()
to each.
library(dplyr)
library(tidyr)
library(purrr)
nested_test <- test_data %>%
nest(data = !class, .by = class) %>%
mutate(data = map(
data,
\(d) count(distinct(d, id, gender), gender)
))
Results:
#> nested_test
# A tibble: 3 × 2
class data
<chr> <list>
1 h <tibble [2 × 2]>
2 m <tibble [2 × 2]>
3 s <tibble [1 × 2]>
#> nested_test$data
[[1]]
# A tibble: 2 × 2
gender n
<chr> <int>
1 f 1
2 m 2
[[2]]
# A tibble: 2 × 2
gender n
<chr> <int>
1 f 1
2 m 1
[[3]]
# A tibble: 1 × 2
gender n
<chr> <int>
1 f 1