I have a data set where I have a unique proposal ID, application year & financial statement year. One proposal ID shall have one application year(t) & could have t-1 &(or) t-2 financial year statements. I have multiple columns for debt, equity, networth etc & want to have two columns for YOY growth -F1 & YOY growth-2.
dataset :
Proposal ID Application Year Financial statement year Net sales
P1 2019 2019 100
P1 2019 2018 120
P1 2019 2017 130
Now basis each proposal ID I need additional columns on growth rates between financial statement years against my application year
desired output :
Proposal ID Application Year Financial statement year Net sales YOY - netsales-g1
P1 2019 2019 100 (100-120)/120...
P1 2019 2018 120
P1 2019 2017 130
this same step I need to do for all columns I have.
What I want is a function -- for each proposal ID it estimates the YOY growth & take out the latest application date as the final row with columns as YOY growth for all numeric variables in dataset
Thank you in advance for the help! :)
CodePudding user response:
This can be done use the dplyr::lead()
formula in mutate()
. The jantior::clean_names()
is optional to make the code writing easier.
df %>%
janitor::clean_names() %>%
mutate(YoY_net_sales=(net_sales-lead(net_sales,n=1L))/lead(net_sales,n=1L))
If you found this helpful, please upvote or select as answer
CodePudding user response:
I am not sure but is it what you need?
library(dplyr)
library(tidyverse)
data %>% arrange(Financial_Statement_Year) %>%
mutate(Growth_Difference = Net_Sales - lag(Net_Sales)) %>%
mutate(Growth_Rate = (Growth_Difference / Net_Sales) * 100)
Proposal_ID | Application_Year | Financial_Statement_Year | Net_Sales | Growth_Difference | Growth_Rate |
---|---|---|---|---|---|
P3 | 2019 | 2017 | 130 | NA | NA |
P2 | 2019 | 2018 | 120 | -10 | -8.333 |
P1 | 2019 | 2019 | 100 | -20 | -20.000 |