Hi I am trying to find the YTD change. YTD formula is (current month value/last month of previous year)-1. The result I would like to get is in column y.
For example, for Jan-20 is (20/100)-1 ; Feb-20 is (120/100)-1. Basically all values divide by Dec-19 which is the last month of year 2019.
And for Jan-21, it should be divided by Dec-20 value so its (100/210)-1.
structure(list(date = structure(c(1575158400, 1577836800, 1580515200,
1583020800, 1585699200, 1588291200, 1590969600, 1593561600, 1596240000,
1598918400, 1601510400, 1604188800, 1606780800, 1609459200, 1612137600
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), x = c(100,
20, 120, 90, 100, 40, 55, 70, 90, 120, 290, 100, 210, 100, 130
), y = c(NA, -0.8, 0.2, -0.1, 0, -0.6, -0.45, -0.3, -0.1, 0.2,
1.9, 0, 1.1, -0.523809523809524, -0.380952380952381)), class = "data.frame", row.names =
c(NA, -15L))
date x y
2019-12-01 100 NA
2020-01-01 20 -0.8000000
2020-02-01 120 0.2000000
2020-03-01 90 -0.1000000
2020-04-01 100 0.0000000
2020-05-01 40 -0.6000000
2020-06-01 55 -0.4500000
2020-07-01 70 -0.3000000
2020-08-01 90 -0.1000000
2020-09-01 120 0.2000000
2020-10-01 290 1.9000000
2020-11-01 100 0.0000000
2020-12-01 210 1.1000000
2021-01-01 100 -0.5238095
2021-02-01 130 -0.3809524
CodePudding user response:
Here's a solution using the tidyverse and lubridate packages. First we create a data frame called last_per_year
that stores the last value for each year. Then we in the main data frame, we calculate each date's "last year" value, and use this to join with last_per_year
. With that done, it's simple to perform the YTD calculation.
This technique would make it easy to select multiple columns in last_per_year
, join those into the main data set, and compute whatever calculations are needed.
last_per_year <- df %>% # YOUR DATA GOES HERE
group_by(year = year(date)) %>% # for each year...
slice_max(order_by = date) %>% # get the last date in each year
select(year, last_value = x) # output columns are "year" and "last_value" (renamed from "x")
year last_value
<dbl> <dbl>
1 2019 100
2 2020 210
3 2021 130
df.new <- df %>%
select(-y) %>% # removing your example output
year = year(date),
prev_year = year - 1
) %>%
inner_join(last_per_year, by = c(prev_year = 'year')) %>% # joining with "last_per_year"
ytd = x / last_value - 1
date x year prev_year last_value ytd
1 2020-01-01 20 2020 2019 100 -0.8000000
2 2020-02-01 120 2020 2019 100 0.2000000
3 2020-03-01 90 2020 2019 100 -0.1000000
4 2020-04-01 100 2020 2019 100 0.0000000
5 2020-05-01 40 2020 2019 100 -0.6000000
6 2020-06-01 55 2020 2019 100 -0.4500000
7 2020-07-01 70 2020 2019 100 -0.3000000
8 2020-08-01 90 2020 2019 100 -0.1000000
9 2020-09-01 120 2020 2019 100 0.2000000
10 2020-10-01 290 2020 2019 100 1.9000000
11 2020-11-01 100 2020 2019 100 0.0000000
12 2020-12-01 210 2020 2019 100 1.1000000
13 2021-01-01 100 2021 2020 210 -0.5238095
14 2021-02-01 130 2021 2020 210 -0.3809524