Home > Enterprise >  Using a generated list of functions for mutate(across vs mutate_at
Using a generated list of functions for mutate(across vs mutate_at

Time:04-06

I'm working off this answer that describes the use of mutate_at and supplying a list of functions applied to a column. I have modified the code in that answer and have a working example that seems to produce the quantities I am looking for (growth rates of a variable over different intervals):

library(tidyverse)

set.seed(1)

## data
df <- data.frame(t = 1:10, y = runif(10))
lags <- c(1, 3, 5)

df %>% mutate_at(vars(y), .funs = {
  map(lags, function(i) ~ (.x - lag(.x, n = i)) / lag(.x, n = i)) %>%
    setNames(sprintf("growth_%1i", lags))
})

#     t          y    growth_1    growth_3   growth_5
# 1   1 0.26550866          NA          NA         NA
# 2   2 0.37212390  0.40155088          NA         NA
# 3   3 0.57285336  0.53941567          NA         NA
# 4   4 0.90820779  0.58541059  2.42063336         NA
# 5   5 0.20168193 -0.77793415 -0.45802478         NA
# 6   6 0.89838968  3.45448772  0.56827164  2.3836549
# 7   7 0.94467527  0.05152061  0.04015323  1.5386041
# 8   8 0.66079779 -0.30050271  2.27643527  0.1535200
# 9   9 0.62911404 -0.04794772 -0.29973145 -0.3073016
# 10 10 0.06178627 -0.90178844 -0.93459523 -0.6936450

However, since mutate_at has been superseded by the across syntax and for consistency with the rest of my code, I have been trying to get a working version with the new syntax. I have code that runs but doesn't seem to produce the new columns and I haven't been able to figure out why.


df %>% mutate(across(y, .funs = {
  map(lags, function(i) ~ (.x - lag(.x, n = i)) / lag(.x, n = i)) %>%
    setNames(sprintf("growth_%1i", lags))
}))

#     t          y
# 1   1 0.26550866
# 2   2 0.37212390
# 3   3 0.57285336
# 4   4 0.90820779
# 5   5 0.20168193
# 6   6 0.89838968
# 7   7 0.94467527
# 8   8 0.66079779
# 9   9 0.62911404
# 10 10 0.06178627

I had previously tried generating lists of functions outside the mutate call but couldn't get it to work. I thought the issue with the current code might be the placement of parentheses/braces/etc. but adjusting those hasn't resolved the problem. Any insights are appreciated.

CodePudding user response:

It is much easier to do this outside and then bind with the original data instead of creating a list or tibble object in across and then unnesting

library(purrr)
library(stringr)
library(dplyr)
map_dfc(lags, ~ df %>% 
   transmute(!! str_c('growth_', .x) := (y - lag(y, n = .x))/lag(y, n = .x))) %>%
   bind_cols(df, .)

-output

    t         y   growth_1   growth_3   growth_5
1   1 0.8696908         NA         NA         NA
2   2 0.3403490 -0.6086552         NA         NA
3   3 0.4820801  0.4164288         NA         NA
4   4 0.5995658  0.2437058 -0.3105989         NA
5   5 0.4935413 -0.1768355  0.4501036         NA
6   6 0.1862176 -0.6226910 -0.6137206 -0.7858807
7   7 0.8273733  3.4430457  0.3799541  1.4309557
8   8 0.6684667 -0.1920615  0.3544292  0.3866300
9   9 0.7942399  0.1881517  3.2651170  0.3246917
10 10 0.1079436 -0.8640919 -0.8695346 -0.7812876

If we want to use across

library(tidyr)
df %>%
   mutate(across(y,  function(.x) m
    ap_dfc(lags, function(i)  (.x - lag(.x, i))/(lag(.x, i))), 
       .names = "growth")) %>% 
   unnest(growth, names_sep = "_") %>%
   rename_with(~ str_c('growth_', lags), starts_with('growth'))

-output

# A tibble: 10 × 5
       t     y growth_1 growth_3 growth_5
   <int> <dbl>    <dbl>    <dbl>    <dbl>
 1     1 0.870   NA       NA       NA    
 2     2 0.340   -0.609   NA       NA    
 3     3 0.482    0.416   NA       NA    
 4     4 0.600    0.244   -0.311   NA    
 5     5 0.494   -0.177    0.450   NA    
 6     6 0.186   -0.623   -0.614   -0.786
 7     7 0.827    3.44     0.380    1.43 
 8     8 0.668   -0.192    0.354    0.387
 9     9 0.794    0.188    3.27     0.325
10    10 0.108   -0.864   -0.870   -0.781
  • Related