I'm using mutate_if
to modify columns of some dataframes in my workspace. When using only mutate
I can create variables based on pre-created ones, e.g.
x %>%
mutate(new = column_a * 2,
new_2 = new * 2)
But this approach doesn't work with mutate_if
so I have to make some kind of 'recursive method' creating each variable from the 'beginning' e.g.
mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'),
.funs = list(Real = ~ . / Deflator,
Real_YoY = ~ (((. / Deflator) / lag((. / Deflator), 12))-1) * 100))
Which the desired output is like:
mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'),
.funs = list(Real = ~ . / Deflator,
Real_YoY = ~ ((Real / lag(Real, 12))-1) * 100))
Is there some way to organize the code to get close this? Thank you!
Reproducible example:
x <- data.frame(x = seq(1,10),
x1 = seq(21,30),
y = seq(10,19))
x %>% mutate_if(str_detect(colnames(.), 'x'),
.funs = list(new = ~ (. * 2),
new2 = ~ (. * 2) * 4)) # where (. * 2) could make reference to the variable 'new'
CodePudding user response:
You need to do this in two mutate calls. With across
it is not aware of the new columns. For example, even if you try to use a specific column you know will be created, this will cause an error:
x %>%
mutate(across(
.cols = contains('x'),
.fns = list(
new = ~(.x*2),
new2 = x_new
)
))
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(.cols = contains("x"), .fns =
#> list(new = ~(.x * 2), new2 = x_new))`.
#> Caused by error:
#> ! object 'x_new' not found
The second issue is that you need to make sure it's calling the appropriate *_new
column. This can be done by accessing the cur_column()
to create a symbol which to evaluate in the context of the data.frame.
x %>%
mutate(across(
.cols = contains('x'),
.fns = list(
new = ~(.x*2)
)
)) %>%
mutate(across(
.cols = matches("x[[:digit:]]?$"),
.fns = list(
new2 = ~eval(as.symbol(paste0(cur_column(), "_new"))) * 4
)
))
CodePudding user response:
Instead of a list
, return a tibble
which can also get the previous column value from its name and then unnest
the tibble
columns
library(dplyr)
library(tidyr)
x %>%
mutate(across(starts_with('x'),
~ tibble(`1` = (.x * 2),
`2` = `1` * 4), .names = "{.col}_new")) %>%
unnest(where(is.tibble), names_sep = "")
-output
# A tibble: 10 × 7
x x1 y x_new1 x_new2 x1_new1 x1_new2
<int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 21 10 2 8 42 168
2 2 22 11 4 16 44 176
3 3 23 12 6 24 46 184
4 4 24 13 8 32 48 192
5 5 25 14 10 40 50 200
6 6 26 15 12 48 52 208
7 7 27 16 14 56 54 216
8 8 28 17 16 64 56 224
9 9 29 18 18 72 58 232
10 10 30 19 20 80 60 240
Or could also use mutate
after converting to tibble
x %>%
transmute(across(starts_with('x'), ~ tibble(new1 = .x *2) %>%
mutate(new2 = new1 *4))) %>%
unnest(where(is_tibble), names_sep = "_") %>%
bind_cols(x, .)
-output
x x1 y x_new1 x_new2 x1_new1 x1_new2
1 1 21 10 2 8 42 168
2 2 22 11 4 16 44 176
3 3 23 12 6 24 46 184
4 4 24 13 8 32 48 192
5 5 25 14 10 40 50 200
6 6 26 15 12 48 52 208
7 7 27 16 14 56 54 216
8 8 28 17 16 64 56 224
9 9 29 18 18 72 58 232
10 10 30 19 20 80 60 240
Or block the multiple statements within {}
x %>%
mutate(across(starts_with('x'), ~
{
new <- .x * 2
new2 <- new * 4
tibble(new, new2)}, .names = "{.col}_")) %>%
unnest(where(is_tibble), names_sep = "")
# A tibble: 10 × 7
x x1 y x_new x_new2 x1_new x1_new2
<int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 21 10 2 8 42 168
2 2 22 11 4 16 44 176
3 3 23 12 6 24 46 184
4 4 24 13 8 32 48 192
5 5 25 14 10 40 50 200
6 6 26 15 12 48 52 208
7 7 27 16 14 56 54 216
8 8 28 17 16 64 56 224
9 9 29 18 18 72 58 232
10 10 30 19 20 80 60 240