I want to create a lag variable in my dataframe.
I have a list of time c("20110127", "20110128", "20110129", "20110130", "20110131", "20110201", "20110202")
.
My dataframe look like:
ID | TIME | VALUE |
---|---|---|
1 | 20110127 | 3.23 |
1 | 20110128 | 4.57 |
1 | 20110130 | 3.22 |
1 | 20110131 | 4.33 |
1 | 20110202 | 5.23 |
2 | 20110127 | 0.86 |
2 | 20110128 | 7.55 |
2 | 20110129 | 1.24 |
2 | 20110201 | 3.69 |
The year in the table is not consecutive. And I want to group_by ID then create a one-day lag variable. Which look like this:
ID | TIME | VALUE | LAG |
---|---|---|---|
1 | 20110127 | 3.23 | NA |
1 | 20110128 | 4.57 | 3.23 |
1 | 20110130 | 3.22 | NA |
1 | 20110131 | 4.33 | 3.22 |
1 | 20110201 | 1.22 | 4.33 |
1 | 20110202 | 5.23 | 1.22 |
2 | 20110127 | 0.86 | NA |
2 | 20110128 | 7.55 | 0.86 |
2 | 20110129 | 1.24 | 7.55 |
2 | 20110201 | 3.69 | NA |
I tried
alltime = c("20110127", "20110128", "20110129", "20110130", "20110131", "20110201", "20110202")
last_day = data %>%
arrange(ID, TIME) %>%
group_by(ID) %>%
mutate(lagtime = alltime[which(alltime == TIME) 1], lagoutput = value ,value = NULL)
data %>%
left_join(last_day)
But it seems like the mutate function wouldn't work here. Maybe the problem is alltime[which(alltime == TIME) 1]
and I cannot create a lag.
Appreciate any comment.
CodePudding user response:
You can use diff()
and lubridate::ymd()
to make a temporary column that indicates if there are gaps in the dates; then leverage that gap to get conditional lag of VALUE
data %>%
group_by(ID) %>%
mutate(g = c(0,diff(lubridate::ymd(TIME))),
LAG = if_else(g==1,lag(VALUE),as.double(NA))) %>%
select(-g)
Output:
ID TIME VALUE LAG
<int> <int> <dbl> <dbl>
1 1 20110127 3.23 NA
2 1 20110128 4.57 3.23
3 1 20110130 3.22 NA
4 1 20110131 4.33 3.22
5 1 20110202 5.23 NA
6 2 20110127 0.86 NA
7 2 20110128 7.55 0.86
8 2 20110129 1.24 7.55
9 2 20110201 3.69 NA
CodePudding user response:
We can use an ifelse
statement. So, if the rows are 1 day apart (here I convert to a time format using lubridate
), then return lag
value
and if not return NA
.
library(tidyverse)
library(lubridate)
df %>%
arrange(ID, TIME) %>%
group_by(ID) %>%
mutate(LAG = ifelse(ymd(TIME) - lag(ymd(TIME)) == 1, lag(VALUE), NA))
Output
ID TIME VALUE LAG
<int> <int> <dbl> <dbl>
1 1 20110127 3.23 NA
2 1 20110128 4.57 3.23
3 1 20110130 3.22 NA
4 1 20110131 4.33 3.22
5 1 20110201 1.22 4.33
6 1 20110202 5.23 1.22
7 2 20110127 0.86 NA
8 2 20110128 7.55 0.86
9 2 20110129 1.24 7.55
10 2 20110201 3.69 NA
Data
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
TIME = c(20110127L, 20110128L, 20110130L, 20110131L, 20110201L,
20110202L, 20110127L, 20110128L, 20110129L, 20110201L), VALUE = c(3.23,
4.57, 3.22, 4.33, 1.22, 5.23, 0.86, 7.55, 1.24, 3.69)), class = "data.frame", row.names = c(NA,
-10L))