I am trying to use dplyr::mutate
to group_by
data and create new columns, using custom function which return a vector, and the function takes a long time to bootstrap.
I know this can be implemented in base R, but is there a more elegent way in dplyr.
Example (discarded):
iris %>%
group_by(Species) %>%
mutate(t1 = f(iris$Sepal.Length)[1], t2 = f(iris$Sepal.Length)[2])
f <- function(x) {
return(c(2*x, x 1))
}
Is it possible to create two columns only call the function once in each group?
I made a mistake in the previous example.. Please check this example instead:
Example:
f <- function(x) {
return(c(x*2, x 1))
}
iris %>%
group_by(Species) %>%
group_modify(~ {
.x %>%
mutate(t1 := f(mean(.x$Sepal.Length))[1], t2 := f(mean(.x$Sepal.Length))[2])
})
Method 1:
Thank Darren Tsai for the answer! The problem is solved using unnest_wider
in the new example:
library(dplyr)
library(tidyr)
iris %>%
group_by(Species) %>%
group_modify(~ {
.x %>%
mutate(t = list(f(mean(.x$Sepal.Length)))) %>%
unnest_wider(t, names_sep = "")
})
# A tibble: 150 × 7
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width t1 t2
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2 10.0 6.01
2 setosa 4.9 3 1.4 0.2 10.0 6.01
3 setosa 4.7 3.2 1.3 0.2 10.0 6.01
4 setosa 4.6 3.1 1.5 0.2 10.0 6.01
5 setosa 5 3.6 1.4 0.2 10.0 6.01
6 setosa 5.4 3.9 1.7 0.4 10.0 6.01
7 setosa 4.6 3.4 1.4 0.3 10.0 6.01
8 setosa 5 3.4 1.5 0.2 10.0 6.01
9 setosa 4.4 2.9 1.4 0.2 10.0 6.01
10 setosa 4.9 3.1 1.5 0.1 10.0 6.01
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows
Method 2:
Thanks Konrad Rudolph for his advise! A more flexible way to this question!
to_tibble <- function (x, colnames) {
x %>%
matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
as_tibble()
}
iris %>%
group_by(Species) %>%
mutate(to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))
CodePudding user response:
The issue with your code is that it passes a vector to f
, so the result probably isn’t what you’re expecting:
f(1 : 5)
# [1] 2 4 6 8 10 2 3 4 5 6
Your calling code will have to disentangle that.
You can do that, e.g. using the following helper:
to_tibble <- function (x, colnames) {
x %>%
matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
as_tibble()
}
With that, you can now call your f
inside mutate
and provide target column names:
iris %>%
group_by(Species) %>%
mutate(to_tibble(f(Sepal.Length), c("t1", "t2"))
The advantage of this method is that it simplifies the calling code and harnesses mutate
’s built-in support for producing multiple columns — no manual unnesting required.
Regarding your updated code/requirement, you can simplify that too using the helper function:
iris %>%
group_by(Species) %>%
mutate(to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))
CodePudding user response:
You could store mutated values in a list and unnest them to multiple columns with unnest_wider
from tidyr
.
library(dplyr)
library(tidyr)
iris %>%
group_by(Species) %>%
mutate(t = list(f(mean(Sepal.Length)))) %>%
unnest_wider(t, names_sep = "")
# A tibble: 150 × 7
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species t1 t2
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 10.0 6.01
2 4.9 3 1.4 0.2 setosa 10.0 6.01
3 4.7 3.2 1.3 0.2 setosa 10.0 6.01
CodePudding user response:
I don't have enough reputation to comment this data.table
solution, but using data.table
you could do the following:
library(data.table)
setDT(iris)
ff <- function(x,y) {
return(list(2*x, x 1))
}
iris[, c("t1","t2") := ff(Sepal.Length), by = "Species"]
Would appreciate if someone with more reputation could make this a comment.