I have a data df
as follows:
df <- data.frame(id = c(1,1,1,2,2,2,2,3,3),
year=c(2011,2012,2013,2010,2011,2012,2013,2012,2013),
points=c(45,69,79,53,13,12,11,89,91),
result = c(2,3,5,4,6,1,2,3,4))
But I want to make df
as below:
df <- data.frame(id = c(1,1,2,2,2,3),
year=c(2011,2012,2010,2011,2012,2012),
points=c(45,69,53,13,12,89),
result = c(3,5,6,1,2,4))
Here, I want to do some regression with the response variable result
. Since I want to estimate result
, I have to delay the response variable result
and leave the other dependent variable points
. So, for my regression setting, result
is the response variable and points
is the dependent variable. In summary, I want to do time lag for result
. Within each id, each last row should be removed because, there are no next result
.
I simplified my problem for demonstration purpose. Is there any way to achieve this using R?
CodePudding user response:
Tidyverse solution:
library(tidyverse)
df %>% group_by(id) %>% mutate(lead_result = lead(result)) %>% na.exclude
# A tibble: 6 x 5
# Groups: id [3]
id year points result lead_result
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2011 45 2 3
2 1 2012 69 3 5
3 2 2010 53 4 6
4 2 2011 13 6 1
5 2 2012 12 1 2
6 3 2012 89 3 4
CodePudding user response:
A data.table
solution:
library(data.table)
na.omit(setDT(df)[, result := shift(result, type = "lead"), by = "id"], "result")
Output
id year points result
<num> <num> <num> <num>
1: 1 2011 45 3
2: 1 2012 69 5
3: 2 2010 53 6
4: 2 2011 13 1
5: 2 2012 12 2
6: 3 2012 89 4