Home > database >  dplyr::mutate when custom function return a vector
dplyr::mutate when custom function return a vector

Time:08-19

I am trying to use dplyr::mutate to group_by data and create new columns, using custom function which return a vector, and the function takes a long time to bootstrap.

I know this can be implemented in base R, but is there a more elegent way in dplyr.

Example (discarded):

iris %>% 
  group_by(Species) %>% 
  mutate(t1 = f(iris$Sepal.Length)[1], t2 = f(iris$Sepal.Length)[2])

f <- function(x) {
  return(c(2*x, x 1))
}

Is it possible to create two columns only call the function once in each group?


I made a mistake in the previous example.. Please check this example instead:

Example:

f <- function(x) {
  return(c(x*2, x 1))
}

iris %>% 
  group_by(Species) %>% 
  
  group_modify(~ {
    .x %>% 
      mutate(t1 := f(mean(.x$Sepal.Length))[1], t2 := f(mean(.x$Sepal.Length))[2])
  })

Method 1:

Thank Darren Tsai for the answer! The problem is solved using unnest_wider in the new example:

library(dplyr)
library(tidyr)

iris %>% 
  group_by(Species) %>% 
  group_modify(~ {
    .x %>% 
      mutate(t = list(f(mean(.x$Sepal.Length)))) %>% 
      unnest_wider(t, names_sep = "")
  })

# A tibble: 150 × 7
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width    t1    t2
   <fct>          <dbl>       <dbl>        <dbl>       <dbl> <dbl> <dbl>
 1 setosa           5.1         3.5          1.4         0.2  10.0  6.01
 2 setosa           4.9         3            1.4         0.2  10.0  6.01
 3 setosa           4.7         3.2          1.3         0.2  10.0  6.01
 4 setosa           4.6         3.1          1.5         0.2  10.0  6.01
 5 setosa           5           3.6          1.4         0.2  10.0  6.01
 6 setosa           5.4         3.9          1.7         0.4  10.0  6.01
 7 setosa           4.6         3.4          1.4         0.3  10.0  6.01
 8 setosa           5           3.4          1.5         0.2  10.0  6.01
 9 setosa           4.4         2.9          1.4         0.2  10.0  6.01
10 setosa           4.9         3.1          1.5         0.1  10.0  6.01
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows

Method 2:

Thanks Konrad Rudolph for his advise! A more flexible way to this question!

to_tibble <- function (x, colnames) {
  x %>%
    matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
    as_tibble()
}
iris %>%
  group_by(Species) %>%
  mutate(to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))

CodePudding user response:

The issue with your code is that it passes a vector to f, so the result probably isn’t what you’re expecting:

f(1 : 5)
# [1]  2  4  6  8 10  2  3  4  5  6                                        

Your calling code will have to disentangle that.

You can do that, e.g. using the following helper:

to_tibble <- function (x, colnames) {
    x %>%
        matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
        as_tibble()
}

With that, you can now call your f inside mutate and provide target column names:

iris %>%
    group_by(Species) %>%
    mutate(to_tibble(f(Sepal.Length), c("t1", "t2"))

The advantage of this method is that it simplifies the calling code and harnesses mutate’s built-in support for producing multiple columns — no manual unnesting required.


Regarding your updated code/requirement, you can simplify that too using the helper function:

iris %>%
    group_by(Species) %>%
    mutate(to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))

CodePudding user response:

You could store mutated values in a list and unnest them to multiple columns with unnest_wider from tidyr.

library(dplyr)
library(tidyr)

iris %>% 
  group_by(Species) %>% 
  mutate(t = list(f(mean(Sepal.Length)))) %>%
  unnest_wider(t, names_sep = "")

# A tibble: 150 × 7
# Groups:   Species [3]
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species       t1    t2
           <dbl>       <dbl>        <dbl>       <dbl> <fct>      <dbl> <dbl>
  1          5.1         3.5          1.4         0.2 setosa      10.0  6.01
  2          4.9         3            1.4         0.2 setosa      10.0  6.01
  3          4.7         3.2          1.3         0.2 setosa      10.0  6.01

CodePudding user response:

I don't have enough reputation to comment this data.table solution, but using data.table you could do the following:

library(data.table)
setDT(iris)

ff <- function(x,y) {
  return(list(2*x, x 1))
}

iris[, c("t1","t2") := ff(Sepal.Length), by = "Species"]

Would appreciate if someone with more reputation could make this a comment.

  • Related