Home > OS >  Conditional Calculation in R Dataframe?
Conditional Calculation in R Dataframe?

Time:07-21

IF i have the following dataframe:

tibble(date = c("2019end", "2020q1", "2020q2", "2020q3", "2020q4", "2021q1", "2021q2", "2021q3", "2021end"), 
           company = c("google","google","google","google","google","google","google","google","google" ), 
           value = c(1,2,3,4,5,6,7,8,9))

And a condition i have on the data is to calculate the differences between periods and years e.g. 2021q1 - 2020q1 = change between those values.

For the specific case when a q4 value is given for one year, but "end" is not given for the following year (ahead or behind)

for example in the above dataframe, 2020q4 is given, but 2021q4 is not given, but 2021end is given. hence calculation would be 2021end - 2020q4

And also vice versa, so if 2020end was given, but 2021Q4 was given, then the calcuation would be 2021Q4 - 2020end

So for the example above dataframe, in this case, i would like to do the calculation of change as 2021end - 2020q4.

How can i code this scenario out?

Effectively this is what the table should look like:

| date | company| value| change|
| - | - |-|-| 
|2019end| google|1|1-NA|
| 2020q1  | google|2|2- NA |
| 2020q2| google|3|3-NA |
| 2020q3  | google|4|4-NA |
| 2020q4  | google|5|5-NA |
| 2021q1  | google|6|6-2 = 4 |
| 2021q2  | google|7|7-3 = 4 |
| 2021q3  | google|8|8-4= 4 |
| 2021end  | google|9|9-5 =4|

Response to r2Evans Answer:

If i would like to calculate the change in percentage terms with reference to the original periods value.

So for example at this stage:

dat %>%
  separate(date, c("year", "qtr"), sep = 4) %>%
  mutate(qtr = factor(qtr, levels = c("q1", "q2", "q3", "q4", "end"))) %>%
  complete(year, qtr, company) %>%
  arrange(year, qtr) %>%
  fill(value) %>%
  pivot_wider(c(company, year), names_from = "qtr", values_from = "value") %>%
  group_by(company) %>%
  mutate(across(-year, ~ c(NA, diff(.))))

When we get to the across statement, how can i do the following calculation, for example taking 2021q1:

(2021q1 - 2020q1)/2020q1

as this would be the relevant change and then again for all the other periods so it would be:

(2021q2 - 2020q2)/2020q2

(2021q3 - 2020q3)/2020q3

(20201q4 - 2020q4)/2020q4

(2021end - 2020end)/2020end

This includes where end is in place of q4.

I think this is a custom function in across

CodePudding user response:

Up front, try this:

library(dplyr)
library(tidyr) # complete, separate, unite, fill, pivot_*
dat %>%
  separate(date, c("year", "qtr"), sep = 4) %>%
  mutate(qtr = factor(qtr, levels = c("q1", "q2", "q3", "q4", "end"))) %>%
  complete(year, qtr, company) %>%
  arrange(year, qtr) %>%
  fill(value) %>%
  pivot_wider(c(company, year), names_from = "qtr", values_from = "value") %>%
  group_by(company) %>%
  mutate(across(-year, ~ c(NA, diff(.)))) %>%
  ungroup() %>%
  pivot_longer(-c(company, year), names_to = "qtr", values_to = "change") %>%
  unite("date", c("year", "qtr"), sep = "") %>%
  left_join(dat, ., by = c("company", "date"))
# # A tibble: 9 x 4
#   date    company value change
#   <chr>   <chr>   <dbl>  <dbl>
# 1 2019end google      1     NA
# 2 2020q1  google      2     NA
# 3 2020q2  google      3     NA
# 4 2020q3  google      4     NA
# 5 2020q4  google      5     NA
# 6 2021q1  google      6      4
# 7 2021q2  google      7      4
# 8 2021q3  google      8      4
# 9 2021end google      9      4

Interim steps:

First, separate year from qtr, convert into factors (for ordering) and ensure we see all five (q1-5 and end) for each year. This produces some NA before our observed data, no problems. We fill NA values with the preceding (by quarter) value, so that we can handle your corner-case of end-q4.

dat %>%
  separate(date, c("year", "qtr"), sep = 4) %>%
  mutate(qtr = factor(qtr, levels = c("q1", "q2", "q3", "q4", "end"))) %>%
  complete(year, qtr, company) %>%
  arrange(year, qtr) %>%
  fill(value)
# # A tibble: 15 x 4
#    year  qtr   company value
#    <chr> <fct> <chr>   <dbl>
#  1 2019  q1    google     NA
#  2 2019  q2    google     NA
#  3 2019  q3    google     NA
#  4 2019  q4    google     NA
#  5 2019  end   google      1
#  6 2020  q1    google      2
#  7 2020  q2    google      3
#  8 2020  q3    google      4
#  9 2020  q4    google      5
# 10 2020  end   google      5
# 11 2021  q1    google      6
# 12 2021  q2    google      7
# 13 2021  q3    google      8
# 14 2021  q4    google      8
# 15 2021  end   google      9

Pivot wider, handy now as we can do row-wise (across quarters) across years.

... %>%
  pivot_wider(c(company, year), names_from = "qtr", values_from = "value")
# # A tibble: 3 x 7
#   company year     q1    q2    q3    q4   end
#   <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 google  2019     NA    NA    NA    NA     1
# 2 google  2020      2     3     4     5     5
# 3 google  2021      6     7     8     8     9

Calculate the differences.

... %>%
  group_by(company) %>%
  mutate(across(-year, ~ c(NA, diff(.)))) %>%
  ungroup()
# # A tibble: 3 x 7
#   company year     q1    q2    q3    q4   end
#   <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 google  2019     NA    NA    NA    NA    NA
# 2 google  2020     NA    NA    NA    NA     4
# 3 google  2021      4     4     4     3     4

Pivot back into long format. Unite the year/qtr columns back into the original date strings. Note that we have more rows than the original data, which will be resolved in the next step.

... %>%
  tidyr::pivot_longer(-c(company, year), names_to = "qtr", values_to = "change") %>%
  tidyr::unite("date", c("year", "qtr"), sep = "")
# # A tibble: 15 x 3
#    company date    change
#    <chr>   <chr>    <dbl>
#  1 google  2019q1      NA
#  2 google  2019q2      NA
#  3 google  2019q3      NA
#  4 google  2019q4      NA
#  5 google  2019end     NA
#  6 google  2020q1      NA
#  7 google  2020q2      NA
#  8 google  2020q3      NA
#  9 google  2020q4      NA
# 10 google  2020end      4
# 11 google  2021q1       4
# 12 google  2021q2       4
# 13 google  2021q3       4
# 14 google  2021q4       3
# 15 google  2021end      4

And then we left-join this new data onto the original data, which removes the extra date rows we created.

  • Related