Home > Net >  Why is nest not giving me multiple datasets and operating on the entire dataframe?
Why is nest not giving me multiple datasets and operating on the entire dataframe?

Time:01-31

I'm trying to make datasets for each class by using nest, and after doing so, I need to perform some computations that require me to use distinct() to avoid duplicates.

However, when I try, R seems to ignore nest and just proceeds on. As a result, I just have the overall results and just one dataframe. How do I get this to work and why is it failing?

Note: I know that, for the below simple example, I don't need to use nest and could use group_by(), but I need nest() for my actual data and am curious why it isn't working.

#Set up and sample data
library(tidyverse)
test_data <- tibble(id = c(1, 1, 2, 2, 2, 3, 3, 3),
                    class = c("h", "h", "m", "h", "s", "m", "h", "h"),
                    gender = c("m", "m", "f", "f", "f", "m", "m", "m"))

#Runs but isn't correct
nested_test <- test_data %>%
  nest(data = class) %>%
  distinct(id, gender) %>%
  count(gender)

nested_test

CodePudding user response:

Not 100% sure I understand your goal, but this code nests all columns except class into a separate dataframe for each class. Note you first specify the columns wanted, then the variable(s) to group .by. It then maps over each dataframe, applying distinct() and count() to each.

library(dplyr)
library(tidyr)
library(purrr)

nested_test <- test_data %>%
  nest(data = !class, .by = class) %>%
  mutate(data = map(
    data, 
    \(d) count(distinct(d, id, gender), gender)
  ))

Results:

#> nested_test
# A tibble: 3 × 2
  class data            
  <chr> <list>          
1 h     <tibble [2 × 2]>
2 m     <tibble [2 × 2]>
3 s     <tibble [1 × 2]>

#> nested_test$data
[[1]]
# A tibble: 2 × 2
  gender     n
  <chr>  <int>
1 f          1
2 m          2

[[2]]
# A tibble: 2 × 2
  gender     n
  <chr>  <int>
1 f          1
2 m          1

[[3]]
# A tibble: 1 × 2
  gender     n
  <chr>  <int>
1 f          1
  • Related