I have an unbalanced panel (with unequally spaced measurement points) and would like to create a lagged variable of x
by group (Variable: id
) but only for consecutive time points. My data looks like this:
# simple example with an unbalanced panel
base <- data.frame(id = rep(1:2, each = 7),
time = c(1, 2, 3, 4, 7, 8, 10, 3, 4, 6, 9, 10, 11, 14),
x = rnorm(14, mean = 3, sd = 1))
I already tried this code using dplyr
:
base_lag <- base %>% # Add lagged column
group_by(id) %>%
dplyr::mutate(lag1_x = dplyr::lag(x, n = 1, default = NA)) %>%
as.data.frame()
base_lag # Print updated data
However, this way I get a lagged variable regardless of the fact that in some cases it is not two consecutive time points.
My final data set should look like this:
id time x lag1_x
1 1 1 3.437416 NA
2 1 2 2.300553 3.437416
3 1 3 2.374212 2.300553
4 1 4 4.374009 2.374212
5 1 7 1.177433 NA
6 1 8 1.543353 1.177433
7 1 10 3.222358 NA
8 2 3 3.763765 NA
9 2 4 3.881182 3.763765
10 2 6 4.754420 NA
11 2 9 4.518227 NA
12 2 10 2.512486 4.518227
13 2 11 3.129230 2.512486
14 2 14 2.152509 NA
Does anyone here have a tip for me on how to create this lagged variable? Many thanks in advance!
CodePudding user response:
You could use ifelse
, testing whether diff(time)
is equal to 1. If so, write the lag. If not, write an NA
.
base %>%
group_by(id) %>%
mutate(lag1_x = ifelse(c(0, diff(time)) == 1, lag(x, n = 1, default = NA), NA)) %>%
as.data.frame()
#> id time x lag1_x
#> 1 1 1 1.852343 NA
#> 2 1 2 2.710538 1.852343
#> 3 1 3 2.700785 2.710538
#> 4 1 4 2.588489 2.700785
#> 5 1 7 3.252223 NA
#> 6 1 8 2.108079 3.252223
#> 7 1 10 3.435683 NA
#> 8 2 3 1.762462 NA
#> 9 2 4 2.775732 1.762462
#> 10 2 6 3.377396 NA
#> 11 2 9 3.133336 NA
#> 12 2 10 3.804190 3.133336
#> 13 2 11 2.942893 3.804190
#> 14 2 14 3.503608 NA
CodePudding user response:
An option is also to create a grouping based on the diff
erence
library(dplyr)
base %>%
group_by(id, grp = cumsum(c(TRUE, diff(time) != 1))) %>%
mutate(lag1_x = lag(x)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 14 × 4
id time x lag1_x
<int> <dbl> <dbl> <dbl>
1 1 1 3.81 NA
2 1 2 2.79 3.81
3 1 3 3.04 2.79
4 1 4 1.76 3.04
5 1 7 1.72 NA
6 1 8 2.68 1.72
7 1 10 3.31 NA
8 2 3 2.92 NA
9 2 4 2.02 2.92
10 2 6 1.71 NA
11 2 9 2.56 NA
12 2 10 1.62 2.56
13 2 11 3.30 1.62
14 2 14 3.69 NA