Home > other >  how to use a function with multiple results with groups
how to use a function with multiple results with groups

Time:10-26

I have a little problem for which I could use some help. I defined a function that compute several vectors, based on 2 vectors. The general idea is that the results are linked (which is why I have only one function), and the calculations implies aggregations (elements of the results not only depends of the same elements of the arguments, but also of other elements).

My problem is actually pretty simple : I want to call my function on a dataframe, in order to compute the results by groups, and store it into several variables.

Basically, here is what I want to do :

myFunction <- function(x, y){
  list(a = x   y,
       b = cumsum(x))
}

data <- data.frame(var1 = c(1,2,4,7,2),
           var2 = c(2,6,2,4,6),
           groups = c("a", "a", "b", "b", "b"))

data %>% group_by(groups) %>% mutate(new1 = myFunction(var1, var2)[[1]],
                                     new2 = myFunction(var1, var2)[[2]])

However, I would like to call my function only once, unlike in the example.

Does anyone has an idea to do this? Many thanks!

François

CodePudding user response:

like this? We can change the function a bit, and call it directly as its own step.

data <- data.frame(var1 = c(1,2,4,7,2),
                   var2 = c(2,6,2,4,6),
                   groups = c("a", "a", "b", "b", "b"))

myFunction <- function(dt, x, y){
  dt %>%
    mutate(new1 = {{ x }}   {{ y }},
           new2 = cumsum({{ x }}))
}

data %>%
  group_by(groups) %>%
  myFunction(var1, var2)

# A tibble: 5 x 5
# Groups:   groups [2]
   var1  var2 groups  new1  new2
  <dbl> <dbl> <chr>  <dbl> <dbl>
1     1     2 a          3     1
2     2     6 a          8     3
3     4     2 b          6     4
4     7     4 b         11    11
5     2     6 b          8    13

Explanation > rlang's {{ }} is a way to quote-and-unquote into a single interpolation step. Its purpose is to delay the evaluation, i.e dealing with the non-standard evalution of the pipe, the variable of the function to when it is needed.

CodePudding user response:

If you convert the output of myFunction to a data frame, it just works automatically inside mutate:

data %>% 
   group_by(groups) %>% 
   mutate(as.data.frame(myFunction(var1, var2)))

#> # A tibble: 5 x 5
#> # Groups:   groups [2]
#>    var1  var2 groups     a     b
#>   <dbl> <dbl> <chr>  <dbl> <dbl>
#> 1     1     2 a          3     1
#> 2     2     6 a          8     3
#> 3     4     2 b          6     4
#> 4     7     4 b         11    11
#> 5     2     6 b          8    13

Obviously you can do this inside your function to make the mutate call look nicer, or do it inside the mutate call as in my example, depending on how important it is that your function returns a list rather than a data frame.

  • Related