Home > database >  Creating multiple new columns using mutate() and across() in R
Creating multiple new columns using mutate() and across() in R

Time:04-11

I would like to perform the following calculation on many columns at the same time while they are grouped by ID:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04))

The data set entails a column for Assets.YYYY.MM and Returns.YYYY.MM for each month from 2018.04 to 2022.02 and I would like to create a Flows column for each of those.

I know that I could do it like this for every column:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04)) %>%
  mutate("Flows.2018.05"= Assets.2018.05 - 
           (Assets.2018.04 * Returns.2018.05))

But as I want to do this calculation for 50 columns I was hoping there is a more elegant way. To my knowledge it should be possible with the dplyr across() function but I was not able to figure out how to do this.

I would like the new columns to be named Flows.YYYY.MM which complicates the issue further. I thought that the easiest way to achieve this might be to simply rename the columns after creating them.

I have also thought about converting the data frame from wide format to long format to perform this calculation, however this seemed even more complicated to me.

Any suggestions on achieving the desired outcome?

Please find below the sample data, as requested:

library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))

df
    ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55           5000           2345           3459            1.03            0.94
2 6F55           3000           1926           6933            0.77            1.11
3 ANE3           5870           8563           1533            1.01            0.89
4 ANE3           4098           9373           4556            0.97            1.02
5 6F55           9878           7432           9855            1.06            1.02

The desired outcome is:

  ID    Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55            5000           2345           3459            1.03            0.94        -2805          1255
2 6F55            3000           1926           6933            0.77            1.11         -384          4795
3 ANE3            5870           8563           1533            1.01            0.89         2634         -6088
4 ANE3            4098           9373           4556            0.97            1.02         5398         -5004
5 6F55            9878           7432           9855            1.06            1.02        -3039          2274

CodePudding user response:

How about this:

  library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))


df %>% 
  pivot_longer(-ID, 
               names_to = c(".value", "date"), 
               names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>% 
  arrange(ID, date) %>% 
  group_by(ID, date) %>% 
  mutate(obs = seq_along(date)) %>% 
  group_by(ID, obs) %>% 
  mutate(Flow = Assets - (lag(Assets)*Returns)) %>% 
  pivot_wider(names_from = "date", 
              values_from = c("Assets", "Returns", "Flow")) %>% 
  as.data.frame()
#>     ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55   1           5000           2345           3459              NA
#> 2 6F55   2           3000           1926           6933              NA
#> 3 6F55   3           9878           7432           9855              NA
#> 4 ANE3   1           5870           8563           1533              NA
#> 5 ANE3   2           4098           9373           4556              NA
#>   Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1            1.03            0.94           NA     -2805.00      1254.70
#> 2            0.77            1.11           NA      -384.00      4795.14
#> 3            1.06            1.02           NA     -3038.68      2274.36
#> 4            1.01            0.89           NA      2634.30     -6088.07
#> 5            0.97            1.02           NA      5397.94     -5004.46

Created on 2022-04-10 by the reprex package (v2.0.1)

  • Related