Home > Mobile >  Passing arguments to pmap in mutate
Passing arguments to pmap in mutate

Time:01-17

I have a problem with passing arguments to purrr::pmap when using with mutate. I don't understand why some things work and some don't.

My example data:

sdf <- tibble(
  col_id  = c("id1",  "id2", "id3", "id4", "id5", "id6",  "id7",  "id8", "id9", "id10"),
  col_a  = c(0.7,  0.3, 1.4, 0.7, 0.5, 1.1,  0.1,  0.6, 1.7, 0.5),
  col_b  = c(NA, 0.6, 0.2, 0.2, 0.7, 0.2, 0.7,  3.7, 0.7, 0.7),
  col_c  = c(0.3, 0.4,  1.0,  NA,  3.1,  0.2, 0.4,  1.0, 0.1, 0.5))

params = c("col_a", "col_b", "col_c")

Then I want to execute some functions in rows using pmap_dbl.

First code (below) evaluates as intended.

# code 1
sdf_2 <- sdf %>% 
  select(all_of(params)) %>% 
  mutate(sum_p = pmap_dbl(., sum, na.rm = TRUE))

But the same syntax doesn't work with a different function:

sdf_2 <- sdf %>% 
  select(all_of(params)) %>% 
  mutate(mean_p = pmap_dbl(., mean, na.rm = TRUE))

Error in mutate(., mean_p = pmap_dbl(., mean, na.rm = TRUE)) : Caused by error in mean.default(): ! argument "x" is missing, with no default

Also, when I try to pass parameters to sum function directly - not by ... it does not work

sdf_2 <- sdf %>% 
  select(all_of(params)) %>% 
  mutate(sum_p = pmap_dbl(., sum(na.rm = TRUE)))

Error in mutate(., sum_p = pmap_dbl(., sum(na.rm = TRUE))) : Caused by error in pluck(): ! argument "x" is missing, with no default

What is the correct way to pass parameters to functions inside pmap when working on whole dataframe horizontally?

Next question: Is there any way to pas column names stored in params to perform function in pmap only on them? select(all_of(params)) works but result dataframe has no id column. It's easy to recreate, but would be nice to not remove it at all.

CodePudding user response:

Why can't I parse mean to pmap?

Try:

mean(0.7, NA, 0.3, na.rm = TRUE)
sum(0.7, NA, 0.3, na.rm = TRUE)

mean take argument x,sum takes ... directly (check documentation). You'll need:

mean(c(0.7, NA, 0.3), na.rm = TRUE)

I.e.

library(dplyr)
library(purrr)

sdf |> 
  mutate(mean_p = pmap_dbl(across(params), ~ mean(c(...), na.rm = TRUE)))

Output:

# A tibble: 10 × 5
   col_id col_a col_b col_c mean_p
   <chr>  <dbl> <dbl> <dbl>  <dbl>
 1 id1      0.7  NA     0.3  0.5  
 2 id2      0.3   0.6   0.4  0.433
 3 id3      1.4   0.2   1    0.867
 4 id4      0.7   0.2  NA    0.45 
 5 id5      0.5   0.7   3.1  1.43 
 6 id6      1.1   0.2   0.2  0.5  
 7 id7      0.1   0.7   0.4  0.4  
 8 id8      0.6   3.7   1    1.77 
 9 id9      1.7   0.7   0.1  0.833
10 id10     0.5   0.7   0.5  0.567

How to to specify variables in pmap?

  1. With cur_data()
library(dplyr)
library(purrr)

sdf |>
  mutate(sum_p = pmap_dbl(select(cur_data(), all_of(params)), sum, na.rm = TRUE))
  1. With across
library(dplyr)
library(purrr)

sdf |> 
  mutate(sum_p = pmap_dbl(across(params), sum, na.rm = TRUE))
  1. Manual list
library(dplyr)
library(purrr)

sdf |>
  mutate(sum_p = pmap_dbl(list(col_a, col_b, col_c), sum, na.rm = TRUE))
  1. With unquote-splicing:
library(dplyr)
library(purrr)
library(rlang)

sdf |>
  mutate(sum_p = pmap_dbl(list(!!!syms(params)), sum, na.rm = TRUE))

Output:

# A tibble: 10 × 5
   col_id col_a col_b col_c sum_p
   <chr>  <dbl> <dbl> <dbl> <dbl>
 1 id1      0.7  NA     0.3   1  
 2 id2      0.3   0.6   0.4   1.3
 3 id3      1.4   0.2   1     2.6
 4 id4      0.7   0.2  NA     0.9
 5 id5      0.5   0.7   3.1   4.3
 6 id6      1.1   0.2   0.2   1.5
 7 id7      0.1   0.7   0.4   1.2
 8 id8      0.6   3.7   1     5.3
 9 id9      1.7   0.7   0.1   2.5
10 id10     0.5   0.7   0.5   1.7

The fast way: Using rowMeans and rowSums with across:

library(dplyr)

sdf |> mutate(mean_p = rowMeans(across(params), na.rm = TRUE))
sdf |> mutate(sum_p = rowSums(across(params), na.rm = TRUE))

Update: Add fourth way

CodePudding user response:

When passed a dataframe and function, pmap() expects the column names of the dataframe to correspond to the arguments of the function. For instance,

fx <- function(a, b, c) (a   b)^c

dat <- data.frame(a = 1:3, b = 4:6, c = 3:1)

dat %>%
  mutate(d = pmap_dbl(., fx))
  a b c   d
1 1 4 3 125
2 2 5 2  49
3 3 6 1   9

mean() expects an argument called x, but there’s no column x in your data - hence the error.

However, if a function takes ... as its first arg, pmap() just passes all columns to .... This is the case for sum(), which is why your code works with sum().

  • Related