Home > OS >  Assign columns as arguments to purrr::pmap by using their names instead of position
Assign columns as arguments to purrr::pmap by using their names instead of position

Time:02-19

I'm trying to use the pmap function from the purrr package to loop over different columns of a tibble. Each column contains certain model parameters (e.g. sample size, coefficients, etc.) that I would like to use as input for my function. Each row belongs to one simulated model. For this, I would like to tell pmap directly to which column the arguments ..1, ..2, etc. should refer to by using the names of those columns. However, I'm struggling with that.

For example, the following code would generate three uniform distributions, whereby each row defines the sample size, min and max value of the distribution.

set.seed(2022)
test <- tibble(n = c(10, 20, 30),
               min = c(0, 100, 500),
               max = c(100, 500, 1000))

test %>% pmap(..1 = n, ..2 = min, ..3 = max, ~runif(n = ..1, min = ..2, max = ..3) %>% round(digits = 0))

The above code runs. However, it seems that explicitly assigning ..1 = n, ..2 = min, etc. in the first part of the pmap function has no effect. Rather, the arguments ..1, ..2, and ..3 seem to refer to the position of the actual columns in the data frame. Leaving out that first part yields the same results

set.seed(2022)
test %>% pmap(~runif(n = ..1, min = ..2, max = ..3) %>% round(digits = 0))

This becomes problematic and prone to errors when the data frame has many columns or the order of the columns changes. For example, for the same df (test2) with two additional columns, the above code will throw an error as column 2 has now been replaced with a character column and in addition columns 1 and 3 now also refer to something different.

test2 <- tibble(model = (1:3),
                type_dist= rep("uniform", 3),
                n = c(10, 20, 30),
                min = c(0, 100, 500),
                max = c(100, 500, 1000))
set.seed(2022)
test2 %>% pmap(..1 = n, ..2 = min, ..3 = max, ~runif(n = ..1, min = ..2, max = ..3) %>% round(digits = 0))

Error in runif(n = ..1, min = ..2, max = ..3) : invalid arguments

It does not matter that I try to assign the ..1, ..2, and ..3 explicitly to the column names. Instead, I would have to make sure that I now refer to columns 3 to 5.

set.seed(2022)
test2 %>% pmap(~runif(n = ..3, min = ..4, max = ..5) %>% round(digits = 0))

With multiple data frames or a larger number of columns in varying order, it is quite easy to mess this up somewhere, e.g. by confusing the order of the parameters.

So my question: Is it possible to explicitly assign the ..1, ..2, ..3, ... arguments that pmap uses to a certain column by its column name, rather than its position in the data frame?

CodePudding user response:

You just need to make sure your columns have the same names as your function arguments. This means pmap is sometimes harder to use with purrr-style ~ functions (which expect ..1, ..2, etc), and simplified by using \(args) or function(args), where you can set your own arguments:

test %>%
  pmap(
    \(n, min, max) round(runif(n, min, max))
  )

If you didn’t need to round, it would be even simpler:

test %>% pmap(runif)

If your dataframe includes columns that aren’t used as arguments, add ... to absorb them. (Otherwise, you’ll get an “unused arguments” error).

test2 %>%
  pmap(
    \(n, min, max, ...) round(runif(n, min, max))
  )

CodePudding user response:

If you want to use pipes inside the anonymous function, you can pass dots directly, without specifying ..1, ..2, ..3.

set.seed(2022)
x = test %>% pmap(..1 = n, ..2 = min, ..3 = max, ~runif(n = ..1, min = ..2, max = ..3) %>% round(digits = 0))

set.seed(2022)
y = test %>%
  pmap(~runif(...) %>% round())

identical(x, y)
[1] TRUE

Alternatively, if you don't mind taking two steps:

z <- test %>%
  pmap(runif) %>%
  map(round)
identical(x, z)
[1] TRUE

CodePudding user response:

I don't think you need the complications of pmap to get what you want. Does this achieve your objective?

library(tidyverse)

test2 %>%
   rowwise() %>% 
   group_map(
     function(.x, .y) {
       runif(n=.x$n, max=.x$max, min=.x$min) %>% 
       round(2)
     }
   ) %>%
   ungroup()
[[1]]
 [1]  77  33  60  49  67  53  36  62 100  60

[[2]]
 [1] 209 379 112 405 405 477 340 236 372 421 266 148 145 252 190 382 330 253 441 190

[[3]]
 [1] 931 717 878 551 650 682 802 916 946 665 870 865 580 937 511 704 659 900 759 689 542 642 799 863 794 538 903 860 589 524
  • Related