Home > Software engineering >  Pass a vector of arguments to map function
Pass a vector of arguments to map function

Time:11-30

I'm trying to create a function that will map across a nested tibble. This function needs to take a vector of parameters that will vary for each row.

When I call purrr:map2() on the nested data, purrr tries to loop over all values of the parameter vector and all rows in the dataset. What can I do to pass the entire vector as a single argument?

library(tidyverse)

myf <- function(x, params) {
  print(params)
  x %>%
    mutate(new_mpg = mpg   rnorm(n(), params[1], params[2])) %>%
    summarise(old = mean(mpg), new = mean(new_mpg)) %>%
    as.list()
}

# Calling function with params defined is great!
myf(mtcars, params = c(5, 10))
#> [1]  5 10
#> $old
#> [1] 20.09062
#> 
#> $new
#> [1] 25.62049

# Cannot work in purr as vector, tries to loop over param
mtcars %>%
  group_by(cyl) %>% # from base R
  nest()   %>%
  mutate(
    newold = map2(data, c(5, 10), myf),
  )
#> [1] 5
#> Warning in rnorm(n(), params[1], params[2]): NAs produced
#> [1] 10
#> Warning in rnorm(n(), params[1], params[2]): NAs produced
#> Error: Problem with `mutate()` column `newold`.
#> ℹ `newold = map2(data, c(5, 10), myf)`.
#> ℹ `newold` must be size 1, not 2.
#> ℹ The error occurred in group 1: cyl = 4.

# New function wrapper with hard-coded params
myf2 <- function(x){
  myf(x, c(5, 10))
}

# works great! but not what I need
mtcars %>%
  group_by(cyl) %>% # from base R
  nest()   %>%
  mutate(
    mean = 5, 
    sd = 10,
    newold = map(data, myf2),
  )
#> [1]  5 10
#> [1]  5 10
#> [1]  5 10
#> # A tibble: 3 × 5
#> # Groups:   cyl [3]
#>     cyl data                mean    sd newold          
#>   <dbl> <list>             <dbl> <dbl> <list>          
#> 1     6 <tibble [7 × 10]>      5    10 <named list [2]>
#> 2     4 <tibble [11 × 10]>     5    10 <named list [2]>
#> 3     8 <tibble [14 × 10]>     5    10 <named list [2]>

Created on 2021-11-29 by the reprex package (v2.0.0)

CodePudding user response:

Skip the group_by() step and just use nest() - otherwise your data will remain grouped after nesting and need to be ungrouped. To get your function to work, just pass the parameters as a list.

library(tidyverse)

mtcars %>%
  nest(data = -cyl) %>%
  mutate(
    newold = map2_df(data, list(c(5, 10)), myf)
  ) %>%
  unpack(newold)

# A tibble: 3 x 4
    cyl data                 old   new
  <dbl> <list>             <dbl> <dbl>
1     6 <tibble [7 x 10]>   19.7  30.7
2     4 <tibble [11 x 10]>  26.7  31.1
3     8 <tibble [14 x 10]>  15.1  17.0

CodePudding user response:

You don't need map2. I think what you need is map.

mtcars %>%
  group_by(cyl) %>% # from base R
  nest()   %>%
  mutate(
    newold = map(data, myf, params = c(5, 10)),
  )
# [1]  5 10
# [1]  5 10
# [1]  5 10
# # A tibble: 3 x 3
# # Groups:   cyl [3]
# cyl data               newold          
# <dbl> <list>             <list>          
#   1     6 <tibble [7 x 10]>  <named list [2]>
#   2     4 <tibble [11 x 10]> <named list [2]>
#   3     8 <tibble [14 x 10]> <named list [2]>

If you have multiple sets of params. You can ungroup your data frame, add a list column with your params, and use map2.

mtcars %>%
  group_by(cyl) %>%
  nest()   %>%
  ungroup() %>%
  # Add different sets of params
  mutate(Params = list(a = c(5, 10), b = c(6, 11), c = c(7, 12))) %>%
  mutate(
    newold = map2(data, Params, myf)
  )
# [1]  5 10
# [1]  6 11
# [1]  7 12
# # A tibble: 3 x 4
# cyl data               Params       newold          
# <dbl> <list>             <named list> <list>          
#   1     6 <tibble [7 x 10]>  <dbl [2]>    <named list [2]>
#   2     4 <tibble [11 x 10]> <dbl [2]>    <named list [2]>
#   3     8 <tibble [14 x 10]> <dbl [2]>    <named list [2]>
  • Related