I am trying to find the mean of the variable disp
in mtcars dataset after nesting it by cyl
. I am able to get the result after nest_by
but not with group_nest
. Please explain what the rowwise
is doing it differently here.
library(pacman)
#> Warning: package 'pacman' was built under R version 4.2.1
p_load(tidyverse)
#working
mtcars %>% nest_by(cyl) %>% mutate(avg = mean(data$disp))
#> # A tibble: 3 × 3
#> # Rowwise: cyl
#> cyl data avg
#> <dbl> <list<tibble[,10]>> <dbl>
#> 1 4 [11 × 10] 105.
#> 2 6 [7 × 10] 183.
#> 3 8 [14 × 10] 353.
#notworking
mtcars %>% group_nest(cyl) %>%
mutate(avg = mean(data$disp))
#> Error in `mutate()`:
#> ! Problem while computing `avg = mean(data$disp)`.
#> Caused by error:
#> ! Corrupt x: no names
#> Backtrace:
#> ▆
#> 1. ├─mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
#> 2. ├─dplyr::mutate(., avg = mean(data$disp))
#> 3. ├─dplyr:::mutate.data.frame(., avg = mean(data$disp))
#> 4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
#> 5. │ ├─base::withCallingHandlers(...)
#> 6. │ └─mask$eval_all_mutate(quo)
#> 7. ├─base::mean(data$disp)
#> 8. ├─data$disp
#> 9. ├─vctrs:::`$.vctrs_list_of`(data, disp)
#> 10. └─base::.handleSimpleError(`<fn>`, "Corrupt x: no names", base::quote(NULL))
#> 11. └─dplyr (local) h(simpleError(msg, call))
#> 12. └─rlang::abort(...)
Created on 2022-10-26 with reprex v2.0.2
CodePudding user response:
rowwise
changes the behavior of subsequent verbs, namely instead of operating on an entire column they will now operate only on values in a given row.
This works because the data
in mutate refers to a single dataframe (due to rowwise provided by nest_by
)
library(dplyr)
library(purrr)
mtcars %>% nest_by(cyl) %>% mutate(avg = mean(data$disp))
#> # A tibble: 3 × 3
#> # Rowwise: cyl
#> cyl data avg
#> <dbl> <list<tibble[,10]>> <dbl>
#> 1 4 [11 × 10] 105.
#> 2 6 [7 × 10] 183.
#> 3 8 [14 × 10] 353.
This will not work because data
refers to a list of dataframes, and disp
is not a name in that list
mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
#> Error in `mutate()`:
#> ! Problem while computing `avg = mean(data$disp)`.
#> Caused by error:
#> ! Corrupt x: no names
#> Backtrace:
#> ▆
#> 1. ├─mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
#> 2. ├─dplyr::mutate(., avg = mean(data$disp))
#> 3. ├─dplyr:::mutate.data.frame(., avg = mean(data$disp))
#> 4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
#> 5. │ ├─base::withCallingHandlers(...)
#> 6. │ └─mask$eval_all_mutate(quo)
#> 7. ├─base::mean(data$disp)
#> 8. ├─data$disp
#> 9. ├─vctrs:::`$.vctrs_list_of`(data, disp)
#> 10. └─base::.handleSimpleError(`<fn>`, "Corrupt x: no names", base::quote(NULL))
#> 11. └─dplyr (local) h(simpleError(msg, call))
#> 12. └─rlang::abort(...)
You may obtain an equivalent calculation by e.g. mapping over the list of dataframes, to apply a function to each dataframe in the list
mtcars %>% group_nest(cyl) %>% mutate(avg = map_dbl(data, ~ mean(.x$disp)))
#> # A tibble: 3 × 3
#> cyl data avg
#> <dbl> <list<tibble[,10]>> <dbl>
#> 1 4 [11 × 10] 105.
#> 2 6 [7 × 10] 183.
#> 3 8 [14 × 10] 353.
Created on 2022-10-26 with reprex v2.0.2
CodePudding user response:
We could use map
to loop over the list
as there is no rowwise grouping with group_nest
library(dplyr)
library(purrr)
mtcars %>%
group_nest(cyl) %>%
mutate(avg = map_dbl(data, ~ mean(.x$disp)))
-output
# A tibble: 3 × 3
cyl data avg
<dbl> <list<tibble[,10]>> <dbl>
1 4 [11 × 10] 105.
2 6 [7 × 10] 183.
3 8 [14 × 10] 353.
According to ?group_nest
The primary use case for group_nest() is with already grouped data frames, typically a result of group_by().
where as with ?nest_by
nest_by() is closely related to group_by(). However, instead of storing the group structure in the metadata, it is made explicit in the data, giving each group key a single row along with a list-column of data frames that contain all the other data.
CodePudding user response:
> library(pacman)
> p_load(tidyverse)
> # working
> mtcars %>% nest_by(cyl) %>% class()
[1] "rowwise_df" "tbl_df" "tbl" "data.frame"
> mtcars %>% nest_by(cyl) %>% mutate(avg = mean(data$disp))
# A tibble: 3 × 3
# Rowwise: cyl
cyl data avg
<dbl> <list<tibble[,10]>> <dbl>
1 4 [11 × 10] 105.
2 6 [7 × 10] 183.
3 8 [14 × 10] 353.
> # not working
> mtcars %>% group_nest(cyl) %>% class()
[1] "tbl_df" "tbl" "data.frame"
> mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
Error in `mutate()`:
! Problem while computing `avg = mean(data$disp)`.
Caused by error:
! Corrupt x: no names
Run `rlang::last_error()` to see where the error occurred.
The nest_by
call yields a rowwise_df
which is amenable to the next step in the pipe, whereas group_nest
yields a plain old tbl_df
, hence the difference