Home > front end >  Finding YTD change in R
Finding YTD change in R

Time:01-27

Hi I am trying to find the YTD change. YTD formula is (current month value/last month of previous year)-1. The result I would like to get is in column y.

For example, for Jan-20 is (20/100)-1 ; Feb-20 is (120/100)-1. Basically all values divide by Dec-19 which is the last month of year 2019.

And for Jan-21, it should be divided by Dec-20 value so its (100/210)-1.

  structure(list(date = structure(c(1575158400, 1577836800, 1580515200, 
  1583020800, 1585699200, 1588291200, 1590969600, 1593561600, 1596240000, 
  1598918400, 1601510400, 1604188800, 1606780800, 1609459200, 1612137600
  ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), x = c(100, 
  20, 120, 90, 100, 40, 55, 70, 90, 120, 290, 100, 210, 100, 130
  ), y = c(NA, -0.8, 0.2, -0.1, 0, -0.6, -0.45, -0.3, -0.1, 0.2, 
  1.9, 0, 1.1, -0.523809523809524, -0.380952380952381)), class = "data.frame", row.names = 
  c(NA, -15L))



      date   x          y
2019-12-01 100         NA
2020-01-01  20 -0.8000000
2020-02-01 120  0.2000000
2020-03-01  90 -0.1000000
2020-04-01 100  0.0000000
2020-05-01  40 -0.6000000
2020-06-01  55 -0.4500000
2020-07-01  70 -0.3000000
2020-08-01  90 -0.1000000
2020-09-01 120  0.2000000
2020-10-01 290  1.9000000
2020-11-01 100  0.0000000
2020-12-01 210  1.1000000
2021-01-01 100 -0.5238095
2021-02-01 130 -0.3809524

CodePudding user response:

Here's a solution using the tidyverse and lubridate packages. First we create a data frame called last_per_year that stores the last value for each year. Then we in the main data frame, we calculate each date's "last year" value, and use this to join with last_per_year. With that done, it's simple to perform the YTD calculation.

This technique would make it easy to select multiple columns in last_per_year, join those into the main data set, and compute whatever calculations are needed.

library(tidyverse)
library(lubridate)

last_per_year <- df %>% # YOUR DATA GOES HERE
  group_by(year = year(date)) %>% # for each year...
  slice_max(order_by = date) %>%  # get the last date in each year
  select(year, last_value = x)    # output columns are "year" and "last_value" (renamed from "x")

   year last_value
  <dbl>      <dbl>
1  2019        100
2  2020        210
3  2021        130

df.new <- df %>% 
  select(-y) %>%  # removing your example output
  mutate(
    year = year(date),
    prev_year = year - 1
  ) %>% 
  inner_join(last_per_year, by = c(prev_year = 'year')) %>% # joining with "last_per_year"
  mutate(
    ytd = x / last_value - 1
  )

df.new
         date   x year prev_year last_value        ytd
1  2020-01-01  20 2020      2019        100 -0.8000000
2  2020-02-01 120 2020      2019        100  0.2000000
3  2020-03-01  90 2020      2019        100 -0.1000000
4  2020-04-01 100 2020      2019        100  0.0000000
5  2020-05-01  40 2020      2019        100 -0.6000000
6  2020-06-01  55 2020      2019        100 -0.4500000
7  2020-07-01  70 2020      2019        100 -0.3000000
8  2020-08-01  90 2020      2019        100 -0.1000000
9  2020-09-01 120 2020      2019        100  0.2000000
10 2020-10-01 290 2020      2019        100  1.9000000
11 2020-11-01 100 2020      2019        100  0.0000000
12 2020-12-01 210 2020      2019        100  1.1000000
13 2021-01-01 100 2021      2020        210 -0.5238095
14 2021-02-01 130 2021      2020        210 -0.3809524
  •  Tags:  
  • Related