Home > Enterprise >  In dplyr mutate across, is it possible to use non-referenced columns in a programmable fashion?
In dplyr mutate across, is it possible to use non-referenced columns in a programmable fashion?

Time:11-03

Suppose I have this tibble with an arbitrary number of variable pairs x and x_var, y and y_var, etc.

dt <- tibble(x = 1:3,
       y = 2:4,
       z = 3:5,
       x_var = rep(0.1, 3),
       y_var = rep(0.2, 3),
       z_var = rep(0.3, 3))

I was attempting to calculate x x_var, y y_var, etc all in one go, using mutate-across.

I tried

tb %>%
  mutate(across(.cols = all_of(c("x", "y", "z")), 
            .names = "{col}_sum", 
            function(x) x   !!rlang::sym(paste0(cur_column(), "_var"))))

but this does not seem to work. I do not want to hard-code variable names and see that it can be done via pivoting, however I'm curious if mutate-across will do the trick somehow.

CodePudding user response:

You’re on the right track with paste0(cur_column(), "_var"). Instead of using sym(), use your computed column name to index into cur_data():

library(dplyr)

tb %>%
  mutate(across(
    .cols = c(x, y, z), 
    .fns = \(x) x   cur_data()[[paste0(cur_column(), "_var")]],
    .names = "{col}_sum"
  ))
# A tibble: 3 × 9
      x     y     z x_var y_var z_var x_sum y_sum z_sum
  <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3   0.1   0.2   0.3   1.1   2.2   3.3
2     2     3     4   0.1   0.2   0.3   2.1   3.2   4.3
3     3     4     5   0.1   0.2   0.3   3.1   4.2   5.3

CodePudding user response:

If the columns are in the same order, we could do

library(dplyr)
dt %>% 
   mutate(across(x:z, .names = "{.col}_sum")   
           across(ends_with("_var")))

-output

# A tibble: 3 × 9
      x     y     z x_var y_var z_var x_sum y_sum z_sum
  <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3   0.1   0.2   0.3   1.1   2.2   3.3
2     2     3     4   0.1   0.2   0.3   2.1   3.2   4.3
3     3     4     5   0.1   0.2   0.3   3.1   4.2   5.3

Or another option is to loop across one set and then modify the OP's code by retrieving the value of the pasted column name with get

dt %>% 
    mutate(across(x:z, ~ .x    
     get(paste0(cur_column(), "_var")), .names = "{.col}_sum"))

-output

# A tibble: 3 × 9
      x     y     z x_var y_var z_var x_sum y_sum z_sum
  <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3   0.1   0.2   0.3   1.1   2.2   3.3
2     2     3     4   0.1   0.2   0.3   2.1   3.2   4.3
3     3     4     5   0.1   0.2   0.3   3.1   4.2   5.3

Or use dplyover

library(dplyover)
library(stringr)
dt %>%
    mutate(across2(x:z, x_var:z_var, ~ .x   .y, 
     .names_fn = ~ str_replace(.x, "_._var", "_sum")))

-output

# A tibble: 3 × 9
      x     y     z x_var y_var z_var x_sum y_sum z_sum
  <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3   0.1   0.2   0.3   1.1   2.2   3.3
2     2     3     4   0.1   0.2   0.3   2.1   3.2   4.3
3     3     4     5   0.1   0.2   0.3   3.1   4.2   5.3

Or this is much simpler in base R

dt[paste0(names(dt)[1:3], "_sum")] <- dt[1:3]   dt[4:6]
  • Related