I am trying use the purrr::map()
in dplyr::summarise()
. The goal is to create summary for each iteration. The map_dfc()
function does this nicely but, as it is in the name, it column-binds the iterations, which requires another modification via pivot_longer()
to get it in the long format and ready for plotting. I also saw that there is a map_dfr()
function, which I was hoping could save me the pivot_longer()
call, and would row-bind the iterations. It also provides an .id
argument to keep track of which iteration has been row-bound (if I understood correctly). However both functions give the same output. Am I doing something wrong? See below for a reproducible example where it can be seen that both outputs (for map_dfc()
and map_dfr()
) are the same.
# packages
library(tidyverse)
# example dataset
set.seed(45)
tibble(site = rep(c(LETTERS[1:3]), each = 6),
name = rep(c(letters[10:15]), 3),
size = runif(18)) %>%
arrange(site, name) -> d_tibble
head(d_tibble)
#> # A tibble: 6 x 3
#> site name size
#> <chr> <chr> <dbl>
#> 1 A j 0.633
#> 2 A k 0.318
#> 3 A l 0.241
#> 4 A m 0.378
#> 5 A n 0.352
#> 6 A o 0.298
# some custom function that is supposed to calculate "a" for a sequence of "i"'s
test_fct <- function(a, i) {
a ^ i
}
# create sequence of i's
i_seq <- seq(0, 5, by = 0.1)
d_tibble %>%
group_by(site, name) %>%
summarise(purrr::map_dfc(set_names(i_seq), ~ test_fct(size, .x)), .groups = "drop") -> d_out
head(d_out)
#> # A tibble: 6 x 53
#> site name `0` `0.1` `0.2` `0.3` `0.4` `0.5` `0.6` `0.7` `0.8` `0.9` `1`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A j 1 0.955 0.913 0.872 0.833 0.796 0.760 0.726 0.694 0.663 0.633
#> 2 A k 1 0.892 0.795 0.709 0.632 0.564 0.502 0.448 0.399 0.356 0.318
#> 3 A l 1 0.867 0.752 0.652 0.566 0.491 0.426 0.369 0.320 0.278 0.241
#> 4 A m 1 0.907 0.823 0.747 0.678 0.615 0.558 0.506 0.460 0.417 0.378
#> 5 A n 1 0.901 0.812 0.731 0.659 0.593 0.535 0.482 0.434 0.391 0.352
#> 6 A o 1 0.886 0.785 0.695 0.616 0.546 0.483 0.428 0.379 0.336 0.298
#> # ... with 40 more variables: `1.1` <dbl>, `1.2` <dbl>, `1.3` <dbl>,
#> # `1.4` <dbl>, `1.5` <dbl>, `1.6` <dbl>, `1.7` <dbl>, `1.8` <dbl>,
#> # `1.9` <dbl>, `2` <dbl>, `2.1` <dbl>, `2.2` <dbl>, `2.3` <dbl>, `2.4` <dbl>,
#> # `2.5` <dbl>, `2.6` <dbl>, `2.7` <dbl>, `2.8` <dbl>, `2.9` <dbl>, `3` <dbl>,
#> # `3.1` <dbl>, `3.2` <dbl>, `3.3` <dbl>, `3.4` <dbl>, `3.5` <dbl>,
#> # `3.6` <dbl>, `3.7` <dbl>, `3.8` <dbl>, `3.9` <dbl>, `4` <dbl>, `4.1` <dbl>,
#> # `4.2` <dbl>, `4.3` <dbl>, `4.4` <dbl>, `4.5` <dbl>, `4.6` <dbl>, ...
d_out %>%
pivot_longer(where(is.double), names_to = "names", values_to = "values")
#> # A tibble: 918 x 4
#> site name names values
#> <chr> <chr> <chr> <dbl>
#> 1 A j 0 1
#> 2 A j 0.1 0.955
#> 3 A j 0.2 0.913
#> 4 A j 0.3 0.872
#> 5 A j 0.4 0.833
#> 6 A j 0.5 0.796
#> 7 A j 0.6 0.760
#> 8 A j 0.7 0.726
#> 9 A j 0.8 0.694
#> 10 A j 0.9 0.663
#> # ... with 908 more rows
# now there is also a map_dfr version to row bind to a data frame, which also take a .id argument
d_tibble %>%
group_by(site, name) %>%
summarise(purrr::map_dfr(set_names(i_seq),
~ test_fct(size, .x), .id = "id"), .groups = "drop") -> d_out2
head(d_out2)
#> # A tibble: 6 x 53
#> site name `0` `0.1` `0.2` `0.3` `0.4` `0.5` `0.6` `0.7` `0.8` `0.9` `1`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A j 1 0.955 0.913 0.872 0.833 0.796 0.760 0.726 0.694 0.663 0.633
#> 2 A k 1 0.892 0.795 0.709 0.632 0.564 0.502 0.448 0.399 0.356 0.318
#> 3 A l 1 0.867 0.752 0.652 0.566 0.491 0.426 0.369 0.320 0.278 0.241
#> 4 A m 1 0.907 0.823 0.747 0.678 0.615 0.558 0.506 0.460 0.417 0.378
#> 5 A n 1 0.901 0.812 0.731 0.659 0.593 0.535 0.482 0.434 0.391 0.352
#> 6 A o 1 0.886 0.785 0.695 0.616 0.546 0.483 0.428 0.379 0.336 0.298
#> # ... with 40 more variables: `1.1` <dbl>, `1.2` <dbl>, `1.3` <dbl>,
#> # `1.4` <dbl>, `1.5` <dbl>, `1.6` <dbl>, `1.7` <dbl>, `1.8` <dbl>,
#> # `1.9` <dbl>, `2` <dbl>, `2.1` <dbl>, `2.2` <dbl>, `2.3` <dbl>, `2.4` <dbl>,
#> # `2.5` <dbl>, `2.6` <dbl>, `2.7` <dbl>, `2.8` <dbl>, `2.9` <dbl>, `3` <dbl>,
#> # `3.1` <dbl>, `3.2` <dbl>, `3.3` <dbl>, `3.4` <dbl>, `3.5` <dbl>,
#> # `3.6` <dbl>, `3.7` <dbl>, `3.8` <dbl>, `3.9` <dbl>, `4` <dbl>, `4.1` <dbl>,
#> # `4.2` <dbl>, `4.3` <dbl>, `4.4` <dbl>, `4.5` <dbl>, `4.6` <dbl>, ...
Created on 2022-11-11 with reprex v2.0.2
CodePudding user response:
Maybe you are looking for something like this:
library(tidyverse)
d_tibble |>
group_split(site, name) |>
map_dfr(~tibble(site = .x$site,
name = .x$name,
i = i_seq,
val = test_fct(.x$size, i_seq)))
#> # A tibble: 918 x 4
#> site name i val
#> <chr> <chr> <dbl> <dbl>
#> 1 A j 0 1
#> 2 A j 0.1 0.955
#> 3 A j 0.2 0.913
#> 4 A j 0.3 0.872
#> 5 A j 0.4 0.833
#> 6 A j 0.5 0.796
#> 7 A j 0.6 0.760
#> 8 A j 0.7 0.726
#> 9 A j 0.8 0.694
#> 10 A j 0.9 0.663
#> # ... with 908 more rows
map_dfr
is expecting an output of dataframes to row bind. If you split the dataframe by group and then map out your expected output for each group, then map_dfr
will output the correct result.