I have the following data frame:
tibble(
company = rep(x = "google", each = 15),
date = c(
"2019q1",
"2019q2",
"2019q3",
"2019q4",
"2019end",
"2020q1",
"2020q2",
"2020q3",
"2020q4",
"2020end",
"2021q1",
"2021q2",
"2021q3",
"2021q4",
"2021end"
),
values = c(1:15)
)
How can i apply the following calculation to this data frame:
This will yield a new column called new_value
e.g. for the first row of 2019q1 this would be:
new_value = 2019q1 2018end 2018q1 More detailed result post 2019 years to follow
I am aware 2018 values do not exist so this will be simply NA
but when you get to 2020q1 the calculation would be this:
2020q1 2019end 2019q1
2020q2 2019end 2019q2
2020q3 2019end 2019q3
2020q4 2019end 2019q4
2020end 2019end 2020end
And so on and so forth.
In essence the formula per row is the following:
current date end date value of previous year previous date value
I have had some issues with trying to do this in long format, pivotting the data doesn't yield the desired result either
CodePudding user response:
Here is a solution using left_joins of mutated versions of the dataframe to align the correct values in each row, prior to a final rowwise addition and cleanup.
library(tidyverse)
tibble(
company = rep(x = "google", each = 15),
date = c(
"2019q1",
"2019q2",
"2019q3",
"2019q4",
"2019end",
"2020q1",
"2020q2",
"2020q3",
"2020q4",
"2020end",
"2021q1",
"2021q2",
"2021q3",
"2021q4",
"2021end"
),
values = c(1:15)
) %>%
mutate(date = str_replace(date, "end", "q5")) %>%
separate(date, into = c("year", "quarter"), sep = "q") %>%
mutate(across(c("year", "quarter", "values"), as.integer)) %>%
left_join(., group_by(., year) %>%
slice_tail(n = 1) %>%
mutate(year = year 1), by = "year") %>%
left_join(., mutate(., year = year 1), by = c("year", "quarter.x")) %>%
mutate(new_values = values.x.x values.y.x values.x.y) %>%
select(company.x.x, year, quarter.x, values.x.x, new_values) %>%
rename_with(~str_remove(., "\\.. "))
# # A tibble: 15 × 5
# company year quarter values new_values
# <chr> <dbl> <int> <int> <int>
# 1 google 2019 1 1 NA
# 2 google 2019 2 2 NA
# 3 google 2019 3 3 NA
# 4 google 2019 4 4 NA
# 5 google 2019 5 5 NA
# 6 google 2020 1 6 12
# 7 google 2020 2 7 14
# 8 google 2020 3 8 16
# 9 google 2020 4 9 18
# 10 google 2020 5 10 20
# 11 google 2021 1 11 27
# 12 google 2021 2 12 29
# 13 google 2021 3 13 31
# 14 google 2021 4 14 33
# 15 google 2021 5 15 35
CodePudding user response:
Borrowed the separate
from @dcsuka and used some lagging to get all the numbers we need on the same row then add them all up
library(tidyverse)
tibble(
company = rep(x = "google", each = 15),
date = c(
"2019q1",
"2019q2",
"2019q3",
"2019q4",
"2019end",
"2020q1",
"2020q2",
"2020q3",
"2020q4",
"2020end",
"2021q1",
"2021q2",
"2021q3",
"2021q4",
"2021end"
),
values = c(1:15)
) %>%
mutate(date = str_replace(date, "end", "q5")) %>%
separate(date, into = c("year", "quarter"), sep = "q", convert = TRUE) %>%
group_by(company) %>%
mutate(
l_x = lag(values,5),
constant_x = lag(ifelse(quarter == 5 & last(values) != values,values,NA),1)) %>%
fill(constant_x,.direction = "down") %>%
mutate(result = values l_x constant_x) %>%
select (-ends_with("_x"))