I have a dataframe, where different lines require different evaluations to compute a result. Each of these evaluations is implemented in a function, and the respective function to use is specified in a column in the dataframe. Here is a minimal example:
f1 = function(a,...){return(2*a)}
f2 = function(a,b,...){return(a b)}
df = data.frame(a=1:4,b=5:8,f=c('f1','f2','f2','f1'))
#Expected result:
a b f result
1 1 5 f1 2
2 2 6 f2 8
3 3 7 f2 10
4 4 8 f1 8
With pmap
, I am able to apply a function to each row of a dataframe, and I also read about exec()
replacing invoke_map()
, but none of my attempts to combine both seem to work because exec()
only seems to work with lists:
df$result = purrr::pmap(df,df$f)
df$result = purrr::pmap(df$f,exec,df)
...
Is there a more elegant way than filtering the dataframe for each function, using pmap on each filtered dataframe and then binding everything back together?
Thank you in advance!
Edit: I should mention that my dataframe has a lot of columns, and that the functions do not need the same arguments (e.g. some may be skipping ´´´a´´´, but require ´´´b´´´). Therefore I need a method where I don't need to pass the arguments explicitly.
CodePudding user response:
You can do this with exec() and pmap()
f1 = function(a,...){return(2*a)}
f2 = function(a,b,...){return(a b)}
df = data.frame(a= 1:4, b = 5:8, f = c('f1',' f2', 'f2', 'f1'))
require(purrr)
require(dplyr)
df |> mutate(result = pmap(list(f, a, b), exec))
#> a b f result
#> 1 1 5 f1 2
#> 2 2 6 f2 8
#> 3 3 7 f2 10
#> 4 4 8 f1 8
Created on 2022-05-27 by the reprex package (v2.0.1)
PS. You might have been getting an error because you were passing named arguments to exec()
. When you pmap(list(f = "f1", a = 1, b = 1), exec)
, all the named arguments are passed to ...
in exec(.fn, ...)
, because none of the list elements are named .fun
.
In the above example, the list elements are passed without their names, and the first argument is therefore assumed (by exec()
) to be .fun
.
So you can use the method you suggested in conjunction with base::unname()
:
df |> relocate(f) |> unname() |> pmap(exec)
# [[1]]
# [1] 2
#
# [[2]]
# [1] 8
#
# [[3]]
# [1] 10
#
# [[4]]
# [1] 8
Whereas without unname()
you will get at error:
df |> relocate(f) |> pmap(exec)
# Error in .f(f = .l[[1L]][[i]], a = .l[[2L]][[i]], b = .l[[3L]][[i]], ...):
# argument ".fn" is missing, with no default
Alternatively, you could rename df$f
to df$.fn
and pass the whole data.frame:
df |> rename(.fn = "f") |> pmap(exec)
# [[1]]
# [1] 2
#
# [[2]]
# [1] 8
#
# [[3]]
# [1] 10
#
# [[4]]
# [1] 8
CodePudding user response:
Using lapply()
over the rows, use do.call()
df$result = lapply(1:nrow(df), \(i) {
do.call(df[i,"f"],as.list(subset(df[i,],select=-f)))
})
Output:
a b f result
<int> <int> <chr> <dbl>
1 1 5 f1 2
2 2 6 f2 8
3 3 7 f2 10
4 4 8 f1 8