dplyr::mutate_if - Using created variables to build new ones-CodePudding

I'm using mutate_if to modify columns of some dataframes in my workspace. When using only mutate I can create variables based on pre-created ones, e.g.

x %>% 
mutate(new = column_a * 2,
       new_2 = new * 2)

But this approach doesn't work with mutate_if so I have to make some kind of 'recursive method' creating each variable from the 'beginning' e.g.

mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'), 
          .funs = list(Real =     ~ . / Deflator, 
                       Real_YoY = ~ (((. / Deflator) / lag((. / Deflator), 12))-1) * 100))

Which the desired output is like:

mutate_if(!str_detect(names(.), 'date|PIB|Deflator|[$]'), 
          .funs = list(Real =     ~ . / Deflator, 
                       Real_YoY = ~ ((Real / lag(Real, 12))-1) * 100))

Is there some way to organize the code to get close this? Thank you!

Reproducible example:

 x <- data.frame(x = seq(1,10),
                 x1 = seq(21,30),
                 y = seq(10,19))
 
 x %>% mutate_if(str_detect(colnames(.), 'x'), 
                 .funs = list(new = ~ (. * 2),
                              new2 = ~ (. * 2) * 4)) # where (. * 2) could make reference to the variable 'new'

CodePudding user response：

You need to do this in two mutate calls. With across it is not aware of the new columns. For example, even if you try to use a specific column you know will be created, this will cause an error:

x %>% 
  mutate(across(
    .cols = contains('x'),
    .fns = list(
      new = ~(.x*2),
      new2 = x_new
    )
  ))
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(.cols = contains("x"), .fns =
#>   list(new = ~(.x * 2), new2 = x_new))`.
#> Caused by error:
#> ! object 'x_new' not found

The second issue is that you need to make sure it's calling the appropriate *_new column. This can be done by accessing the cur_column() to create a symbol which to evaluate in the context of the data.frame.

x %>% 
  mutate(across(
    .cols = contains('x'),
    .fns = list(
      new = ~(.x*2)
    )
  )) %>%
  mutate(across(
    .cols = matches("x[[:digit:]]?$"),
    .fns = list(
      new2 = ~eval(as.symbol(paste0(cur_column(), "_new"))) * 4
    )
  ))

CodePudding user response：

Instead of a list, return a tibble which can also get the previous column value from its name and then unnest the tibble columns

library(dplyr)
library(tidyr)
x %>% 
 mutate(across(starts_with('x'), 
                  ~ tibble(`1` =  (.x * 2),
                              `2` = `1` * 4), .names = "{.col}_new")) %>% 
  unnest(where(is.tibble), names_sep = "")

-output

# A tibble: 10 × 7
       x    x1     y x_new1 x_new2 x1_new1 x1_new2
   <int> <int> <int>  <dbl>  <dbl>   <dbl>   <dbl>
 1     1    21    10      2      8      42     168
 2     2    22    11      4     16      44     176
 3     3    23    12      6     24      46     184
 4     4    24    13      8     32      48     192
 5     5    25    14     10     40      50     200
 6     6    26    15     12     48      52     208
 7     7    27    16     14     56      54     216
 8     8    28    17     16     64      56     224
 9     9    29    18     18     72      58     232
10    10    30    19     20     80      60     240

Or could also use mutate after converting to tibble

x %>%
   transmute(across(starts_with('x'), ~ tibble(new1  = .x *2) %>% 
        mutate(new2 = new1 *4))) %>%
    unnest(where(is_tibble), names_sep = "_") %>% 
    bind_cols(x, .)

-output

    x x1  y x_new1 x_new2 x1_new1 x1_new2
1   1 21 10      2      8      42     168
2   2 22 11      4     16      44     176
3   3 23 12      6     24      46     184
4   4 24 13      8     32      48     192
5   5 25 14     10     40      50     200
6   6 26 15     12     48      52     208
7   7 27 16     14     56      54     216
8   8 28 17     16     64      56     224
9   9 29 18     18     72      58     232
10 10 30 19     20     80      60     240

Or block the multiple statements within {}

x %>%
   mutate(across(starts_with('x'), ~ 
      {
     new <- .x * 2
     new2 <- new * 4
     tibble(new, new2)}, .names = "{.col}_")) %>% 
   unnest(where(is_tibble), names_sep = "")
# A tibble: 10 × 7
       x    x1     y x_new x_new2 x1_new x1_new2
   <int> <int> <int> <dbl>  <dbl>  <dbl>   <dbl>
 1     1    21    10     2      8     42     168
 2     2    22    11     4     16     44     176
 3     3    23    12     6     24     46     184
 4     4    24    13     8     32     48     192
 5     5    25    14    10     40     50     200
 6     6    26    15    12     48     52     208
 7     7    27    16    14     56     54     216
 8     8    28    17    16     64     56     224
 9     9    29    18    18     72     58     232
10    10    30    19    20     80     60     240