I am looking for a data.table solution to the following question.
Suppose I have the following dataset:
library(data.table)
library(lubridate)
DT <- data.table(date = c("2019-01-01 00:03:04",
"2019-01-01 00:07:03",
"2019-01-01 00:15:23",
"2019-01-01 00:16:28",
"2019-01-01 00:21:30"))
DT
date
1: 2019-01-01 00:03:04
2: 2019-01-01 00:07:03
3: 2019-01-01 00:15:23
4: 2019-01-01 00:16:28
5: 2019-01-01 00:21:30
I am looking to map each date into 15 minute interval for each day. So the final dataset would look like:
date date_15min_grp
1: 2019-01-01 00:03:04 1
2: 2019-01-01 00:07:03 1
3: 2019-01-01 00:15:23 2
4: 2019-01-01 00:16:28 2
5: 2019-01-01 00:21:30 2
The groups would be unique at the day level, so 2019-01-01 00:03:04
and 2019-01-02 00:03:04
would both return values of 1. My current solution is to extract the hour/minute/second component of each day. I would then add the values (after conversion into minutes), divide them by 60, and try to map each value to the nearest "group". But this seems to be slow and inelegant.
A data.table/lubridate would be greatly appreciated.
Thanks so much!
CodePudding user response:
Please find below a solution using data.table
and lubridate
Reprex
- Code
library(data.table)
library(lubridate)
DT[, date_15min_grp := fcase(minute(date) < 15, 1,
minute(date) < 30, 2,
minute(date) < 45, 3,
default = 4)][]
- Output
#> date date_15min_grp
#> 1: 2019-01-01 00:03:04 1
#> 2: 2019-01-01 00:07:03 1
#> 3: 2019-01-01 00:15:23 2
#> 4: 2019-01-01 00:16:28 2
#> 5: 2019-01-01 00:21:30 2
Created on 2021-11-30 by the reprex package (v2.0.1)
AS A FOLLOW-UP TO YOUR COMMENT
- Code
library(data.table)
library(lubridate)
DT[, date_15min_grp := fcase(minute(date) < 15, hour(date)*4 1,
minute(date) < 30, hour(date)*4 2,
minute(date) < 45, hour(date)*4 3,
minute(date) < 60, hour(date)*4 4)][]
- Output
#> date date_15min_grp
#> 1: 2019-01-01 00:03:04 1
#> 2: 2019-01-01 00:07:03 1
#> 3: 2019-01-01 00:15:23 2
#> 4: 2019-01-01 00:16:28 2
#> 5: 2019-01-01 00:21:30 2
Created on 2021-12-01 by the reprex package (v2.0.1)
CodePudding user response:
You should see if the slider package works for you. It can both cut the data by rows, but also apply functions like the apply / purrr functionality.
library(tibble)
library(slider)
library(dplyr)
f <- data.frame(date = c(as.POSIXlt("2019-01-01 00:03:04"),
as.POSIXlt("2019-01-01 00:07:03"), as.POSIXlt("2019-01-01 00:15:23"),
as.POSIXlt("2019-01-01 00:16:28"), as.POSIXlt("2019-01-01 00:21:30"),
as.POSIXlt("2019-01-01 00:22:03"), as.POSIXlt("2019-01-01 00:25:23"),
as.POSIXlt("2019-01-01 00:36:28"), as.POSIXlt("2019-01-01 00:41:30"),
as.POSIXlt("2019-01-01 00:47:03"), as.POSIXlt("2019-01-01 00:48:23"),
as.POSIXlt("2019-01-01 00:51:28"), as.POSIXlt("2019-01-01 00:51:30"),
as.POSIXlt("2019-01-01 00:57:03"), as.POSIXlt("2019-01-01 00:61:23"),
as.POSIXlt("2019-01-01 00:66:28"))) %>% arrange(date)
g <- block(f, f$date, period = "minute", every=15)
CodePudding user response:
Two things:
I think your 5th row should be group 2, since it is before
00:30:00
of the day. Please clarify if this is incorrect.You say it needs to be per-day, but your data only includes one day; I'll augment it to demonstrate its per-day calculations.
DT[,date := as.POSIXct(date)]
DT2 <- rbindlist(list(DT, DT[, date := date 86400]))
DT2
# date grp
# <POSc> <int>
# 1: 2019-01-02 00:03:04 1
# 2: 2019-01-02 00:07:03 1
# 3: 2019-01-02 00:15:23 2
# 4: 2019-01-02 00:16:28 2
# 5: 2019-01-02 00:21:30 2
# 6: 2019-01-02 00:03:04 1
# 7: 2019-01-02 00:07:03 1
# 8: 2019-01-02 00:15:23 2
# 9: 2019-01-02 00:16:28 2
# 10: 2019-01-02 00:21:30 2
And the group-assignment:
DT2[, day := format(date, format = "%Y%m%d")
][, grp := findInterval(date, seq(lubridate::floor_date(min(date), unit = "hours"), max(date) 3600, by = "15 mins")), by = day][]
# date grp day
# <POSc> <int> <char>
# 1: 2019-01-02 00:03:04 1 20190102
# 2: 2019-01-02 00:07:03 1 20190102
# 3: 2019-01-02 00:15:23 2 20190102
# 4: 2019-01-02 00:16:28 2 20190102
# 5: 2019-01-02 00:21:30 2 20190102
# 6: 2019-01-02 00:03:04 1 20190102
# 7: 2019-01-02 00:07:03 1 20190102
# 8: 2019-01-02 00:15:23 2 20190102
# 9: 2019-01-02 00:16:28 2 20190102
# 10: 2019-01-02 00:21:30 2 20190102