I have a long historical data like this format (unbalanced). While there is a lag until the data is released (next business day), I would like to record the date as of the day it happened. I tried to use dplyr as follows:
dataframe<-dataframe%>%group_by(date)%>%mutate(cob=lag(date,n=1))
However, it just produces the same result as:
lag(date,1)
date | name | value |
---|---|---|
2023/1/2 | a | X |
2023/1/2 | b | X |
2023/1/2 | c | X |
2023/1/3 | a | X |
2023/1/3 | b | X |
2023/1/4 | a | X |
2023/1/4 | b | X |
2023/1/5 | a | X |
2023/1/5 | b | X |
2023/1/5 | c | X |
I thought about:
dataframe<-dataframe%>%group_by(name)%>%mutate(cob=lag(date,n=1))
but it produces NA when there is no observation for a certain sample.
mutate(cob=date-1)
is not considering business day.
I just would like to slide all the dates in dataframe$date by 1 business day.
I attached the part of the actual data (historical prices of Japanese treasury bills).
structure(list(date = c("2002-08-06", "2002-08-06", "2002-08-07",
"2002-08-07", "2002-08-09", "2002-08-09"), code = c(2870075L,
3000075L, 2870075L, 3000075L, 2870075L, 3000075L), due_date = c("2002-08-20",
"2002-09-10", "2002-08-20", "2002-09-10", "2002-08-20", "2002-09-10"
), ave_price = c(99.99, 99.99, 99.99, 99.99, 99.99, 99.99)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
date = c("2002-08-06", "2002-08-07", "2002-08-09"), .rows = structure(list(
1:2, 3:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
The expected outcome is as follows:
structure(list(date = c("2002-08-06", "2002-08-06", "2002-08-07",
"2002-08-07", "2002-08-09", "2002-08-09"), code = c(2870075L,
3000075L, 2870075L, 3000075L, 2870075L, 3000075L), due_date = c("2002-08-20",
"2002-09-10", "2002-08-20", "2002-09-10", "2002-08-20", "2002-09-10"
), ave_price = c(99.99, 99.99, 99.99, 99.99, 99.99, 99.99), cob = c(NA,
NA, "2002-08-06", "2002-08-06", "2002-08-07", "2002-08-07")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
date = c("2002-08-06", "2002-08-07", "2002-08-09"), .rows = structure(list(
1:2, 3:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
Thank you very much in advance.
CodePudding user response:
If I understand correctly, you want the previous date recorded in your date
column as cob
. So, your Aug 9 rows would have the previously recorded date of Aug 7 in your cob
column.
If so, you could try the following. First, your example data above is grouped so I started with ungroup
. You can get a vector of unique or distinct
dates, and get the lag
or previous date for those dates. In this case, dates of Aug 6, 7, and 9 will have cob
set as NA, Aug 6, and Aug 7.
Then, you can join back to original data with right_join
. The final select
will keep columns and include order desired.
I left date
alone (currently is character value, not in date format).
library(tidyverse)
df %>%
ungroup() %>%
distinct(date) %>%
mutate(cob = lag(date)) %>%
right_join(df) %>%
select(date, code, due_date, ave_price, cob)
Output
date code due_date ave_price cob
<chr> <int> <chr> <dbl> <chr>
1 2002-08-06 2870075 2002-08-20 100. NA
2 2002-08-06 3000075 2002-09-10 100. NA
3 2002-08-07 2870075 2002-08-20 100. 2002-08-06
4 2002-08-07 3000075 2002-09-10 100. 2002-08-06
5 2002-08-09 2870075 2002-08-20 100. 2002-08-07
6 2002-08-09 3000075 2002-09-10 100. 2002-08-07