I would like to perform the following calculation on many columns at the same time while they are grouped by ID:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04))
The data set entails a column for Assets.YYYY.MM and Returns.YYYY.MM for each month from 2018.04 to 2022.02 and I would like to create a Flows column for each of those.
I know that I could do it like this for every column:
df <- df %>%
group_by(Id) %>%
mutate("Flows.2018.04"= Assets.2018.04 -
(Assets.2018.03 * Returns.2018.04)) %>%
mutate("Flows.2018.05"= Assets.2018.05 -
(Assets.2018.04 * Returns.2018.05))
But as I want to do this calculation for 50 columns I was hoping there is a more elegant way. To my knowledge it should be possible with the dplyr across() function but I was not able to figure out how to do this.
I would like the new columns to be named Flows.YYYY.MM which complicates the issue further. I thought that the easiest way to achieve this might be to simply rename the columns after creating them.
I have also thought about converting the data frame from wide format to long format to perform this calculation, however this seemed even more complicated to me.
Any suggestions on achieving the desired outcome?
Please find below the sample data, as requested:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55 5000 2345 3459 1.03 0.94
2 6F55 3000 1926 6933 0.77 1.11
3 ANE3 5870 8563 1533 1.01 0.89
4 ANE3 4098 9373 4556 0.97 1.02
5 6F55 9878 7432 9855 1.06 1.02
The desired outcome is:
ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55 5000 2345 3459 1.03 0.94 -2805 1255
2 6F55 3000 1926 6933 0.77 1.11 -384 4795
3 ANE3 5870 8563 1533 1.01 0.89 2634 -6088
4 ANE3 4098 9373 4556 0.97 1.02 5398 -5004
5 6F55 9878 7432 9855 1.06 1.02 -3039 2274
CodePudding user response:
How about this:
library(tidyverse)
df <- data.frame(
ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))
df %>%
pivot_longer(-ID,
names_to = c(".value", "date"),
names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>%
arrange(ID, date) %>%
group_by(ID, date) %>%
mutate(obs = seq_along(date)) %>%
group_by(ID, obs) %>%
mutate(Flow = Assets - (lag(Assets)*Returns)) %>%
pivot_wider(names_from = "date",
values_from = c("Assets", "Returns", "Flow")) %>%
as.data.frame()
#> ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55 1 5000 2345 3459 NA
#> 2 6F55 2 3000 1926 6933 NA
#> 3 6F55 3 9878 7432 9855 NA
#> 4 ANE3 1 5870 8563 1533 NA
#> 5 ANE3 2 4098 9373 4556 NA
#> Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1 1.03 0.94 NA -2805.00 1254.70
#> 2 0.77 1.11 NA -384.00 4795.14
#> 3 1.06 1.02 NA -3038.68 2274.36
#> 4 1.01 0.89 NA 2634.30 -6088.07
#> 5 0.97 1.02 NA 5397.94 -5004.46
Created on 2022-04-10 by the reprex package (v2.0.1)