IF i have the following dataframe:
tibble(date = c("2019end", "2020q1", "2020q2", "2020q3", "2020q4", "2021q1", "2021q2", "2021q3", "2021end"),
company = c("google","google","google","google","google","google","google","google","google" ),
value = c(1,2,3,4,5,6,7,8,9))
And a condition i have on the data is to calculate the differences between periods and years e.g. 2021q1 - 2020q1 = change between those values.
For the specific case when a q4 value is given for one year, but "end" is not given for the following year (ahead or behind)
for example in the above dataframe, 2020q4 is given, but 2021q4 is not given, but 2021end is given. hence calculation would be 2021end - 2020q4
And also vice versa, so if 2020end was given, but 2021Q4 was given, then the calcuation would be 2021Q4 - 2020end
So for the example above dataframe, in this case, i would like to do the calculation of change as 2021end - 2020q4.
How can i code this scenario out?
Effectively this is what the table should look like:
| date | company| value| change|
| - | - |-|-|
|2019end| google|1|1-NA|
| 2020q1 | google|2|2- NA |
| 2020q2| google|3|3-NA |
| 2020q3 | google|4|4-NA |
| 2020q4 | google|5|5-NA |
| 2021q1 | google|6|6-2 = 4 |
| 2021q2 | google|7|7-3 = 4 |
| 2021q3 | google|8|8-4= 4 |
| 2021end | google|9|9-5 =4|
Response to r2Evans Answer:
If i would like to calculate the change in percentage terms with reference to the original periods value.
So for example at this stage:
dat %>%
separate(date, c("year", "qtr"), sep = 4) %>%
mutate(qtr = factor(qtr, levels = c("q1", "q2", "q3", "q4", "end"))) %>%
complete(year, qtr, company) %>%
arrange(year, qtr) %>%
fill(value) %>%
pivot_wider(c(company, year), names_from = "qtr", values_from = "value") %>%
group_by(company) %>%
mutate(across(-year, ~ c(NA, diff(.))))
When we get to the across statement, how can i do the following calculation, for example taking 2021q1:
(2021q1 - 2020q1)/2020q1
as this would be the relevant change and then again for all the other periods so it would be:
(2021q2 - 2020q2)/2020q2
(2021q3 - 2020q3)/2020q3
(20201q4 - 2020q4)/2020q4
(2021end - 2020end)/2020end
This includes where end is in place of q4.
I think this is a custom function in across
CodePudding user response:
Up front, try this:
library(dplyr)
library(tidyr) # complete, separate, unite, fill, pivot_*
dat %>%
separate(date, c("year", "qtr"), sep = 4) %>%
mutate(qtr = factor(qtr, levels = c("q1", "q2", "q3", "q4", "end"))) %>%
complete(year, qtr, company) %>%
arrange(year, qtr) %>%
fill(value) %>%
pivot_wider(c(company, year), names_from = "qtr", values_from = "value") %>%
group_by(company) %>%
mutate(across(-year, ~ c(NA, diff(.)))) %>%
ungroup() %>%
pivot_longer(-c(company, year), names_to = "qtr", values_to = "change") %>%
unite("date", c("year", "qtr"), sep = "") %>%
left_join(dat, ., by = c("company", "date"))
# # A tibble: 9 x 4
# date company value change
# <chr> <chr> <dbl> <dbl>
# 1 2019end google 1 NA
# 2 2020q1 google 2 NA
# 3 2020q2 google 3 NA
# 4 2020q3 google 4 NA
# 5 2020q4 google 5 NA
# 6 2021q1 google 6 4
# 7 2021q2 google 7 4
# 8 2021q3 google 8 4
# 9 2021end google 9 4
Interim steps:
First, separate year
from qtr
, convert into factors (for ordering) and ensure we see all five (q1-5 and end) for each year. This produces some NA
before our observed data, no problems. We fill NA
values with the preceding (by quarter) value, so that we can handle your corner-case of end-q4
.
dat %>%
separate(date, c("year", "qtr"), sep = 4) %>%
mutate(qtr = factor(qtr, levels = c("q1", "q2", "q3", "q4", "end"))) %>%
complete(year, qtr, company) %>%
arrange(year, qtr) %>%
fill(value)
# # A tibble: 15 x 4
# year qtr company value
# <chr> <fct> <chr> <dbl>
# 1 2019 q1 google NA
# 2 2019 q2 google NA
# 3 2019 q3 google NA
# 4 2019 q4 google NA
# 5 2019 end google 1
# 6 2020 q1 google 2
# 7 2020 q2 google 3
# 8 2020 q3 google 4
# 9 2020 q4 google 5
# 10 2020 end google 5
# 11 2021 q1 google 6
# 12 2021 q2 google 7
# 13 2021 q3 google 8
# 14 2021 q4 google 8
# 15 2021 end google 9
Pivot wider, handy now as we can do row-wise (across quarters) across years.
... %>%
pivot_wider(c(company, year), names_from = "qtr", values_from = "value")
# # A tibble: 3 x 7
# company year q1 q2 q3 q4 end
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 google 2019 NA NA NA NA 1
# 2 google 2020 2 3 4 5 5
# 3 google 2021 6 7 8 8 9
Calculate the differences.
... %>%
group_by(company) %>%
mutate(across(-year, ~ c(NA, diff(.)))) %>%
ungroup()
# # A tibble: 3 x 7
# company year q1 q2 q3 q4 end
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 google 2019 NA NA NA NA NA
# 2 google 2020 NA NA NA NA 4
# 3 google 2021 4 4 4 3 4
Pivot back into long format. Unite the year
/qtr
columns back into the original date
strings. Note that we have more rows than the original data, which will be resolved in the next step.
... %>%
tidyr::pivot_longer(-c(company, year), names_to = "qtr", values_to = "change") %>%
tidyr::unite("date", c("year", "qtr"), sep = "")
# # A tibble: 15 x 3
# company date change
# <chr> <chr> <dbl>
# 1 google 2019q1 NA
# 2 google 2019q2 NA
# 3 google 2019q3 NA
# 4 google 2019q4 NA
# 5 google 2019end NA
# 6 google 2020q1 NA
# 7 google 2020q2 NA
# 8 google 2020q3 NA
# 9 google 2020q4 NA
# 10 google 2020end 4
# 11 google 2021q1 4
# 12 google 2021q2 4
# 13 google 2021q3 4
# 14 google 2021q4 3
# 15 google 2021end 4
And then we left-join this new data onto the original data, which removes the extra date
rows we created.