Home > other >  One row time lag using R
One row time lag using R

Time:07-06

I have a data df as follows:

df <- data.frame(id = c(1,1,1,2,2,2,2,3,3),
                 year=c(2011,2012,2013,2010,2011,2012,2013,2012,2013),
                 points=c(45,69,79,53,13,12,11,89,91),
                 result = c(2,3,5,4,6,1,2,3,4))

But I want to make df as below:

df <- data.frame(id = c(1,1,2,2,2,3),
                 year=c(2011,2012,2010,2011,2012,2012),
                 points=c(45,69,53,13,12,89),
                 result = c(3,5,6,1,2,4))

Here, I want to do some regression with the response variable result. Since I want to estimate result, I have to delay the response variable result and leave the other dependent variable points. So, for my regression setting, result is the response variable and points is the dependent variable. In summary, I want to do time lag for result. Within each id, each last row should be removed because, there are no next result. I simplified my problem for demonstration purpose. Is there any way to achieve this using R?

CodePudding user response:

Tidyverse solution:

library(tidyverse)
df %>% group_by(id) %>% mutate(lead_result = lead(result)) %>% na.exclude

# A tibble: 6 x 5
# Groups:   id [3]
     id  year points result lead_result
  <dbl> <dbl>  <dbl>  <dbl>       <dbl>
1     1  2011     45      2           3
2     1  2012     69      3           5
3     2  2010     53      4           6
4     2  2011     13      6           1
5     2  2012     12      1           2
6     3  2012     89      3           4

CodePudding user response:

A data.table solution:

library(data.table)

na.omit(setDT(df)[, result := shift(result, type = "lead"), by = "id"], "result")

Output

      id  year points result
   <num> <num>  <num>  <num>
1:     1  2011     45      3
2:     1  2012     69      5
3:     2  2010     53      6
4:     2  2011     13      1
5:     2  2012     12      2
6:     3  2012     89      4
  • Related