I'm making a table of lagged columns for time series data, but I'm having trouble reshaping the data.
My original data.table looks like this:
A data table sorted by descending years column and a doy column with one value, the n_a column starts from 9 and descends to 4
And I want to make it look like this:
Lagged variable time series table where each column starts with the row after the prev
CodePudding user response:
Assuming your data frame is called df
, you can do:
df[4:7] <- lapply(1:4, function(x) dplyr::lead(df$n_a, x))
names(df)[4:7] <- paste0('n_a_lag', 1:4)
df
#> doy year n_a n_a_lag1 n_a_lag2 n_a_lag3 n_a_lag4
#> 1 1 2022 9 8 7 6 5
#> 2 1 2021 8 7 6 5 4
#> 3 1 2020 7 6 5 4 NA
#> 4 1 2019 6 5 4 NA NA
#> 5 1 2018 5 4 NA NA NA
#> 6 1 2017 4 NA NA NA NA
Data taken from image in question, in reproducible format
df <- data.frame(doy = 1, year = 2022:2017, n_a = 9:4)
df
#> doy year n_a
#> 1 1 2022 9
#> 2 1 2021 8
#> 3 1 2020 7
#> 4 1 2019 6
#> 5 1 2018 5
#> 6 1 2017 4
Created on 2022-07-21 by the reprex package (v2.0.1)
CodePudding user response:
You can use data.table::shift
for a multiple leads and lags:
d[,(paste0("n_a_lag",1:4)):= shift(n_a,1:4,type = "lead")]
Output:
doy year n_a n_a_lag1 n_a_lag2 n_a_lag3 n_a_lag4
<num> <int> <int> <int> <int> <int> <int>
1: 1 2022 9 8 7 6 5
2: 1 2021 8 7 6 5 4
3: 1 2020 7 6 5 4 NA
4: 1 2019 6 5 4 NA NA
5: 1 2018 5 4 NA NA NA
6: 1 2017 4 NA NA NA NA
Input:
d = data.table(
doy =c(1,1,1,1,1,1),
year = 2022:2017,
n_a=9:4
)