Adding sales by previous year and quarter using a custom formula-CodePudding

I have the following data frame:

  tibble(
    company = rep(x = "google", each = 15),
    date = c(
      "2019q1",
      "2019q2",
      "2019q3",
      "2019q4",
      "2019end",
      "2020q1",
      "2020q2",
      "2020q3",
      "2020q4",
      "2020end",
      "2021q1",
      "2021q2",
      "2021q3",
      "2021q4",
      "2021end"
    ),
    values = c(1:15)
  )

How can i apply the following calculation to this data frame:

This will yield a new column called new_value

e.g. for the first row of 2019q1 this would be:

new_value = 2019q1 2018end 2018q1 More detailed result post 2019 years to follow

I am aware 2018 values do not exist so this will be simply NA

but when you get to 2020q1 the calculation would be this:

2020q1 2019end 2019q1

2020q2 2019end 2019q2

2020q3 2019end 2019q3

2020q4 2019end 2019q4

2020end 2019end 2020end

And so on and so forth.

In essence the formula per row is the following:

current date end date value of previous year previous date value

I have had some issues with trying to do this in long format, pivotting the data doesn't yield the desired result either

CodePudding user response：

Here is a solution using left_joins of mutated versions of the dataframe to align the correct values in each row, prior to a final rowwise addition and cleanup.

library(tidyverse)

tibble(
    company = rep(x = "google", each = 15),
    date = c(
      "2019q1",
      "2019q2",
      "2019q3",
      "2019q4",
      "2019end",
      "2020q1",
      "2020q2",
      "2020q3",
      "2020q4",
      "2020end",
      "2021q1",
      "2021q2",
      "2021q3",
      "2021q4",
      "2021end"
    ),
    values = c(1:15)
  ) %>%
  mutate(date = str_replace(date, "end", "q5")) %>%
  separate(date, into = c("year", "quarter"), sep = "q") %>%
  mutate(across(c("year", "quarter", "values"), as.integer)) %>%
  left_join(., group_by(., year) %>%
    slice_tail(n = 1) %>%
    mutate(year = year   1), by = "year") %>%
  left_join(., mutate(., year = year   1), by = c("year", "quarter.x")) %>%
  mutate(new_values = values.x.x   values.y.x   values.x.y) %>%
  select(company.x.x, year, quarter.x, values.x.x, new_values) %>%
  rename_with(~str_remove(., "\\.. "))

# # A tibble: 15 × 5
#    company  year quarter values new_values
#    <chr>   <dbl>   <int>  <int>      <int>
#  1 google   2019       1      1         NA
#  2 google   2019       2      2         NA
#  3 google   2019       3      3         NA
#  4 google   2019       4      4         NA
#  5 google   2019       5      5         NA
#  6 google   2020       1      6         12
#  7 google   2020       2      7         14
#  8 google   2020       3      8         16
#  9 google   2020       4      9         18
# 10 google   2020       5     10         20
# 11 google   2021       1     11         27
# 12 google   2021       2     12         29
# 13 google   2021       3     13         31
# 14 google   2021       4     14         33
# 15 google   2021       5     15         35

CodePudding user response：

Borrowed the separate from @dcsuka and used some lagging to get all the numbers we need on the same row then add them all up

library(tidyverse)

tibble(
    company = rep(x = "google", each = 15),
    date = c(
      "2019q1",
      "2019q2",
      "2019q3",
      "2019q4",
      "2019end",
      "2020q1",
      "2020q2",
      "2020q3",
      "2020q4",
      "2020end",
      "2021q1",
      "2021q2",
      "2021q3",
      "2021q4",
      "2021end"
    ),
    values = c(1:15)
  ) %>%
  mutate(date = str_replace(date, "end", "q5")) %>%
  separate(date, into = c("year", "quarter"), sep = "q", convert = TRUE) %>%
group_by(company) %>%
mutate(
l_x = lag(values,5),
constant_x = lag(ifelse(quarter == 5 & last(values) != values,values,NA),1)) %>%
fill(constant_x,.direction = "down") %>%
mutate(result = values   l_x   constant_x) %>%
select (-ends_with("_x"))