I have a problem with passing arguments to purrr::pmap
when using with mutate
.
I don't understand why some things work and some don't.
My example data:
sdf <- tibble(
col_id = c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8", "id9", "id10"),
col_a = c(0.7, 0.3, 1.4, 0.7, 0.5, 1.1, 0.1, 0.6, 1.7, 0.5),
col_b = c(NA, 0.6, 0.2, 0.2, 0.7, 0.2, 0.7, 3.7, 0.7, 0.7),
col_c = c(0.3, 0.4, 1.0, NA, 3.1, 0.2, 0.4, 1.0, 0.1, 0.5))
params = c("col_a", "col_b", "col_c")
Then I want to execute some functions in rows using pmap_dbl
.
First code (below) evaluates as intended.
# code 1
sdf_2 <- sdf %>%
select(all_of(params)) %>%
mutate(sum_p = pmap_dbl(., sum, na.rm = TRUE))
But the same syntax doesn't work with a different function:
sdf_2 <- sdf %>%
select(all_of(params)) %>%
mutate(mean_p = pmap_dbl(., mean, na.rm = TRUE))
Error in mutate(., mean_p = pmap_dbl(., mean, na.rm = TRUE)) : Caused by error in
mean.default()
: ! argument "x" is missing, with no default
Also, when I try to pass parameters to sum function directly - not by ... it does not work
sdf_2 <- sdf %>%
select(all_of(params)) %>%
mutate(sum_p = pmap_dbl(., sum(na.rm = TRUE)))
Error in mutate(., sum_p = pmap_dbl(., sum(na.rm = TRUE))) : Caused by error in
pluck()
: ! argument "x" is missing, with no default
What is the correct way to pass parameters to functions inside pmap when working on whole dataframe horizontally?
Next question:
Is there any way to pas column names stored in params to perform function in pmap only on them?
select(all_of(params))
works but result dataframe has no id column. It's easy to recreate, but would be nice to not remove it at all.
CodePudding user response:
Why can't I parse mean
to pmap
?
Try:
mean(0.7, NA, 0.3, na.rm = TRUE)
sum(0.7, NA, 0.3, na.rm = TRUE)
mean
take argument x
,sum
takes ...
directly (check documentation). You'll need:
mean(c(0.7, NA, 0.3), na.rm = TRUE)
I.e.
library(dplyr)
library(purrr)
sdf |>
mutate(mean_p = pmap_dbl(across(params), ~ mean(c(...), na.rm = TRUE)))
Output:
# A tibble: 10 × 5
col_id col_a col_b col_c mean_p
<chr> <dbl> <dbl> <dbl> <dbl>
1 id1 0.7 NA 0.3 0.5
2 id2 0.3 0.6 0.4 0.433
3 id3 1.4 0.2 1 0.867
4 id4 0.7 0.2 NA 0.45
5 id5 0.5 0.7 3.1 1.43
6 id6 1.1 0.2 0.2 0.5
7 id7 0.1 0.7 0.4 0.4
8 id8 0.6 3.7 1 1.77
9 id9 1.7 0.7 0.1 0.833
10 id10 0.5 0.7 0.5 0.567
How to to specify variables in pmap
?
- With
cur_data()
library(dplyr)
library(purrr)
sdf |>
mutate(sum_p = pmap_dbl(select(cur_data(), all_of(params)), sum, na.rm = TRUE))
- With
across
library(dplyr)
library(purrr)
sdf |>
mutate(sum_p = pmap_dbl(across(params), sum, na.rm = TRUE))
- Manual list
library(dplyr)
library(purrr)
sdf |>
mutate(sum_p = pmap_dbl(list(col_a, col_b, col_c), sum, na.rm = TRUE))
- With unquote-splicing:
library(dplyr)
library(purrr)
library(rlang)
sdf |>
mutate(sum_p = pmap_dbl(list(!!!syms(params)), sum, na.rm = TRUE))
Output:
# A tibble: 10 × 5
col_id col_a col_b col_c sum_p
<chr> <dbl> <dbl> <dbl> <dbl>
1 id1 0.7 NA 0.3 1
2 id2 0.3 0.6 0.4 1.3
3 id3 1.4 0.2 1 2.6
4 id4 0.7 0.2 NA 0.9
5 id5 0.5 0.7 3.1 4.3
6 id6 1.1 0.2 0.2 1.5
7 id7 0.1 0.7 0.4 1.2
8 id8 0.6 3.7 1 5.3
9 id9 1.7 0.7 0.1 2.5
10 id10 0.5 0.7 0.5 1.7
The fast way: Using rowMeans and rowSums with across
:
library(dplyr)
sdf |> mutate(mean_p = rowMeans(across(params), na.rm = TRUE))
sdf |> mutate(sum_p = rowSums(across(params), na.rm = TRUE))
Update: Add fourth way
CodePudding user response:
When passed a dataframe and function, pmap()
expects the column names of the dataframe to correspond to the arguments of the function. For instance,
fx <- function(a, b, c) (a b)^c
dat <- data.frame(a = 1:3, b = 4:6, c = 3:1)
dat %>%
mutate(d = pmap_dbl(., fx))
a b c d
1 1 4 3 125
2 2 5 2 49
3 3 6 1 9
mean()
expects an argument called x
, but there’s no column x
in your data - hence the error.
However, if a function takes ...
as its first arg, pmap()
just passes all columns to ...
. This is the case for sum()
, which is why your code works with sum()
.