Home > front end >  Calculating Lag for dataframe in R?
Calculating Lag for dataframe in R?

Time:05-11

If I have the following dataframe/tibble in R:

tibble(
  period =
    c(
      "2010Q1",
      "2010Q2",
      "2010Q3",
      "2010Q4",
      "2010END",
      "2011Q1",
      "2011Q2",
      "2011Q3",
      "2011Q4",
      "2011END",
      "2012Q1",
      "2012Q2",
      "2012Q3",
      "2012Q4",
      "20120END",
      "2010Q1",
      "2010Q2",
      "2010Q3",
      "2010Q4",
      "2010END",
      "2011Q1",
      "2011Q2",
      "2011Q3",
      "2011Q4",
      "2011END",
      "2012Q1",
      "2012Q2",
      "2012Q3",
      "2012Q4",
      "20120END"),
  website = c(
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook"
  ), 
  values = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1)
)

How can i perform the lag calculation for each period value from the year prior e.g I would like to create the calculation for 2011Q1 - 2010Q1 and so on including 2011END - 2010END

So that i get a table which looks like the following:

tibble(
  period =
    c(
      "2010Q1",
      "2010Q2",
      "2010Q3",
      "2010Q4",
      "2010END",
      "2011Q1",
      "2011Q2",
      "2011Q3",
      "2011Q4",
      "2011END",
      "2012Q1",
      "2012Q2",
      "2012Q3",
      "2012Q4",
      "20120END",
      "2010Q1",
      "2010Q2",
      "2010Q3",
      "2010Q4",
      "2010END",
      "2011Q1",
      "2011Q2",
      "2011Q3",
      "2011Q4",
      "2011END",
      "2012Q1",
      "2012Q2",
      "2012Q3",
      "2012Q4",
      "20120END"),
  website = c(
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "google",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook",
    "facebook"
  ), 
  values = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1), 
  calculation = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,14,12,10,8,6,4,2,0,-2,-4,-6,-8,-10,-12,-14))

Here it makes sense we cannot compare to the period prior so it is NA - for the year 2011 all values are calculated like this:

  • 2010q1 - NA (nothing before)
  • 2010q2 - NA (nothing before)
  • 2010q3 - NA (nothing before)
  • 2010q4 - NA (nothing before)
  • 2010END - NA (nothing before)
  • 2011q1 - 2010q1
  • 2011q2 - 2010q2
  • 2011q3 - 2010q3
  • 2011q4 - 2010q4
  • 2011END - 2010END

Having some issues with using the lag() function when trying to group the data by the period column.

CodePudding user response:

If every year is divided into 5 periods, lag(...,5) should take the value 5 rows earlier that the one you are calculating.

example %>% 
  mutate(calculation = values - lag(values,5))

Output:

# A tibble: 30 x 3
   period  values calculation
   <chr>    <dbl> <dbl>
 1 2010Q1       1    NA
 2 2010Q2       2    NA
 3 2010Q3       3    NA
 4 2010Q4       4    NA
 5 2010END      5    NA
 6 2011Q1       6     5
 7 2011Q2       7     5
 8 2011Q3       8     5
 9 2011Q4       9     5
10 2011END     10     5
# ... with 20 more rows

EDIT: As @AndrewGB accurately said, group_by(website) must be added to separate operations for each website. Also, I'm assuming that rows are already arranged by period.

example %>%
group_by(website) %>%
  mutate(calculation = values - lag(values,5))

CodePudding user response:

Here is another option using data.table, where we can use shift to calculate from the previous year matching value (e.g., 2011Q1 - 2010Q1). I'm also assuming that you want to do this for each website, so I have added a grouping.

library(data.table)

setDT(dt)[, calculation :=  values - shift(values, n = 5, type = "lag"), by = website]

Output

      period  website values calculation
 1:   2010Q1   google      1          NA
 2:   2010Q2   google      2          NA
 3:   2010Q3   google      3          NA
 4:   2010Q4   google      4          NA
 5:  2010END   google      5          NA
 6:   2011Q1   google      6           5
 7:   2011Q2   google      7           5
 8:   2011Q3   google      8           5
 9:   2011Q4   google      9           5
10:  2011END   google     10           5
11:   2012Q1   google     11           5
12:   2012Q2   google     12           5
13:   2012Q3   google     13           5
14:   2012Q4   google     14           5
15: 20120END   google     15           5
16:   2010Q1 facebook     15          NA
17:   2010Q2 facebook     14          NA
18:   2010Q3 facebook     13          NA
19:   2010Q4 facebook     12          NA
20:  2010END facebook     11          NA
21:   2011Q1 facebook     10          -5
22:   2011Q2 facebook      9          -5
23:   2011Q3 facebook      8          -5
24:   2011Q4 facebook      7          -5
25:  2011END facebook      6          -5
26:   2012Q1 facebook      5          -5
27:   2012Q2 facebook      4          -5
28:   2012Q3 facebook      3          -5
29:   2012Q4 facebook      2          -5
30: 20120END facebook      1          -5

CodePudding user response:

If you are lagging not in the context of a panel regression, dplyr::lag won't be your best bet. This is because it linearly takes the previous elements in the column as the lag.

I have used pivot_wider to perform these lags and then pivot_longer to reconstitute the original data frame plus the lagged variable. You may also need to separate your date column into year and quarter columns. Once you've done that, you should be able to calculate the correct lagging. I'll post with corresponding code in a little bit.

  • Related