Home > database >  How to use map_dfc and map_dfr (purrr R package) - it appears they are doing the same
How to use map_dfc and map_dfr (purrr R package) - it appears they are doing the same

Time:11-12

I am trying use the purrr::map() in dplyr::summarise(). The goal is to create summary for each iteration. The map_dfc() function does this nicely but, as it is in the name, it column-binds the iterations, which requires another modification via pivot_longer() to get it in the long format and ready for plotting. I also saw that there is a map_dfr() function, which I was hoping could save me the pivot_longer() call, and would row-bind the iterations. It also provides an .id argument to keep track of which iteration has been row-bound (if I understood correctly). However both functions give the same output. Am I doing something wrong? See below for a reproducible example where it can be seen that both outputs (for map_dfc() and map_dfr()) are the same.

# packages
library(tidyverse)

# example dataset
set.seed(45)
tibble(site = rep(c(LETTERS[1:3]), each = 6),
       name = rep(c(letters[10:15]), 3),
       size = runif(18)) %>%
  arrange(site, name) -> d_tibble

head(d_tibble)
#> # A tibble: 6 x 3
#>   site  name   size
#>   <chr> <chr> <dbl>
#> 1 A     j     0.633
#> 2 A     k     0.318
#> 3 A     l     0.241
#> 4 A     m     0.378
#> 5 A     n     0.352
#> 6 A     o     0.298

# some custom function that is supposed to calculate "a" for a sequence of "i"'s
test_fct <- function(a, i) {
  a ^ i
}

# create sequence of i's
i_seq <- seq(0, 5, by = 0.1)

d_tibble %>%
  group_by(site, name) %>%
  summarise(purrr::map_dfc(set_names(i_seq), ~ test_fct(size, .x)), .groups = "drop") -> d_out

head(d_out)
#> # A tibble: 6 x 53
#>   site  name    `0` `0.1` `0.2` `0.3` `0.4` `0.5` `0.6` `0.7` `0.8` `0.9`   `1`
#>   <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     j         1 0.955 0.913 0.872 0.833 0.796 0.760 0.726 0.694 0.663 0.633
#> 2 A     k         1 0.892 0.795 0.709 0.632 0.564 0.502 0.448 0.399 0.356 0.318
#> 3 A     l         1 0.867 0.752 0.652 0.566 0.491 0.426 0.369 0.320 0.278 0.241
#> 4 A     m         1 0.907 0.823 0.747 0.678 0.615 0.558 0.506 0.460 0.417 0.378
#> 5 A     n         1 0.901 0.812 0.731 0.659 0.593 0.535 0.482 0.434 0.391 0.352
#> 6 A     o         1 0.886 0.785 0.695 0.616 0.546 0.483 0.428 0.379 0.336 0.298
#> # ... with 40 more variables: `1.1` <dbl>, `1.2` <dbl>, `1.3` <dbl>,
#> #   `1.4` <dbl>, `1.5` <dbl>, `1.6` <dbl>, `1.7` <dbl>, `1.8` <dbl>,
#> #   `1.9` <dbl>, `2` <dbl>, `2.1` <dbl>, `2.2` <dbl>, `2.3` <dbl>, `2.4` <dbl>,
#> #   `2.5` <dbl>, `2.6` <dbl>, `2.7` <dbl>, `2.8` <dbl>, `2.9` <dbl>, `3` <dbl>,
#> #   `3.1` <dbl>, `3.2` <dbl>, `3.3` <dbl>, `3.4` <dbl>, `3.5` <dbl>,
#> #   `3.6` <dbl>, `3.7` <dbl>, `3.8` <dbl>, `3.9` <dbl>, `4` <dbl>, `4.1` <dbl>,
#> #   `4.2` <dbl>, `4.3` <dbl>, `4.4` <dbl>, `4.5` <dbl>, `4.6` <dbl>, ...

d_out %>%
  pivot_longer(where(is.double), names_to = "names", values_to = "values")
#> # A tibble: 918 x 4
#>    site  name  names values
#>    <chr> <chr> <chr>  <dbl>
#>  1 A     j     0      1    
#>  2 A     j     0.1    0.955
#>  3 A     j     0.2    0.913
#>  4 A     j     0.3    0.872
#>  5 A     j     0.4    0.833
#>  6 A     j     0.5    0.796
#>  7 A     j     0.6    0.760
#>  8 A     j     0.7    0.726
#>  9 A     j     0.8    0.694
#> 10 A     j     0.9    0.663
#> # ... with 908 more rows


# now there is also a map_dfr version to row bind to a data frame, which also take a .id argument
d_tibble %>%
  group_by(site, name) %>%
  summarise(purrr::map_dfr(set_names(i_seq),
                           ~ test_fct(size, .x), .id = "id"), .groups = "drop") -> d_out2

head(d_out2)
#> # A tibble: 6 x 53
#>   site  name    `0` `0.1` `0.2` `0.3` `0.4` `0.5` `0.6` `0.7` `0.8` `0.9`   `1`
#>   <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A     j         1 0.955 0.913 0.872 0.833 0.796 0.760 0.726 0.694 0.663 0.633
#> 2 A     k         1 0.892 0.795 0.709 0.632 0.564 0.502 0.448 0.399 0.356 0.318
#> 3 A     l         1 0.867 0.752 0.652 0.566 0.491 0.426 0.369 0.320 0.278 0.241
#> 4 A     m         1 0.907 0.823 0.747 0.678 0.615 0.558 0.506 0.460 0.417 0.378
#> 5 A     n         1 0.901 0.812 0.731 0.659 0.593 0.535 0.482 0.434 0.391 0.352
#> 6 A     o         1 0.886 0.785 0.695 0.616 0.546 0.483 0.428 0.379 0.336 0.298
#> # ... with 40 more variables: `1.1` <dbl>, `1.2` <dbl>, `1.3` <dbl>,
#> #   `1.4` <dbl>, `1.5` <dbl>, `1.6` <dbl>, `1.7` <dbl>, `1.8` <dbl>,
#> #   `1.9` <dbl>, `2` <dbl>, `2.1` <dbl>, `2.2` <dbl>, `2.3` <dbl>, `2.4` <dbl>,
#> #   `2.5` <dbl>, `2.6` <dbl>, `2.7` <dbl>, `2.8` <dbl>, `2.9` <dbl>, `3` <dbl>,
#> #   `3.1` <dbl>, `3.2` <dbl>, `3.3` <dbl>, `3.4` <dbl>, `3.5` <dbl>,
#> #   `3.6` <dbl>, `3.7` <dbl>, `3.8` <dbl>, `3.9` <dbl>, `4` <dbl>, `4.1` <dbl>,
#> #   `4.2` <dbl>, `4.3` <dbl>, `4.4` <dbl>, `4.5` <dbl>, `4.6` <dbl>, ...

Created on 2022-11-11 with reprex v2.0.2

CodePudding user response:

Maybe you are looking for something like this:

library(tidyverse)

d_tibble |>
  group_split(site, name) |>
  map_dfr(~tibble(site = .x$site, 
                  name = .x$name,
                  i = i_seq,
                  val = test_fct(.x$size, i_seq)))
#> # A tibble: 918 x 4
#>    site  name      i   val
#>    <chr> <chr> <dbl> <dbl>
#>  1 A     j       0   1    
#>  2 A     j       0.1 0.955
#>  3 A     j       0.2 0.913
#>  4 A     j       0.3 0.872
#>  5 A     j       0.4 0.833
#>  6 A     j       0.5 0.796
#>  7 A     j       0.6 0.760
#>  8 A     j       0.7 0.726
#>  9 A     j       0.8 0.694
#> 10 A     j       0.9 0.663
#> # ... with 908 more rows

map_dfr is expecting an output of dataframes to row bind. If you split the dataframe by group and then map out your expected output for each group, then map_dfr will output the correct result.

  • Related