Home > Back-end >  How to extract several values from a function in a dplyr pipeline
How to extract several values from a function in a dplyr pipeline

Time:09-22

is there a good way to create a dplyr pipeline with mutate extracting several columns from a function in a single step? For example, imagine that you have a dataframe like this:

 x y
 1 5
 2 3
 6 4

and you have a function that returns both the sum and the product:

sum_and_product <- function(x, y) list(sum=x y,product=x*y)

so how does one make a pipeline producing the original dataframe enriched by sum and product columns calculated with a single call? Something like:

df %>% mutate_multiple(c(sum, product)=sum_and_product(x, y))

x y sum product
1 5 6   5
2 3 5   6
6 4 10  24

If this can't be done with a dplyr pipeline, what other alternatives are there?

To get you a better idea of what I'm trying to achieve in my actual real-life use case: I need to calculate structural change points for multiple time series stored in a single data frame. When I only calculate the moment in time when the break occurs, I can do it quite simply and efficiently:

df %>% group_by(timeseries_id) %>% mutate(cpt = my.cpt(time, value))

But the problem is, cpt must return 3 values instead of just one (the time of the change, the value before and the value after), and that breaks everything. When I do it with a loop, it's terribly slow (and also ugly). I guess I could write 3 functions, one per each value to extract, but obviously that's not ideal.

Any suggestions would be appreciated.

Best regards, Nikolai

CodePudding user response:

Change your function from list to data.frame and it will work, i.e.

library(dplyr)
sum_and_product <- function(x, y) data.frame(sum=x y,product=x*y)

df %>% 
 mutate(sum_and_product(x, y))
#  x y     sum     product
#1 1 5       6           5
#2 2 3       5           6
#3 6 4      10          24

CodePudding user response:

You can save the output from sum_and_product as a list and then use unnest_wider to get different columns from them.

library(dplyr)
library(tidyr)

sum_and_product <- function(x, y) list(sum=x y,product=x*y)

df %>%
  rowwise() %>%
  mutate(z = list(sum_and_product(x, y))) %>%
  unnest_wider(z)

#      x     y   sum product
#  <int> <int> <int>   <int>
#1     1     5     6       5
#2     2     3     5       6
#3     6     4    10      24
  • Related