I have a column with timestamps formatted like this printing one row it looks like this for example:
"2022-09-21 02:02:03 UTC"
There are many rows that have the same timestamp, but I want another column with the first timestamp after a specific times based on a vector that looks like this:
start_1 <- strptime("07:15:00, format= "%H:%M:%S")
start_2 <- strptime("09:15:00, format= "%H:%M:%S")
Based on my start_1 timestamp I want to check in my df below which is the first timestamp after start_1 and mark that one in another column called start_datetime.
I only want to mark the first timestamp after the start in my dataset, this is crucial for what I am trying to do, since many of the rows have exactly the same timestamp.
Datetime | start_datetime |
---|---|
2022-09-21 07:02:03 | NA |
2022-09-21 07:02:03 | NA |
2022-09-21 07:14:03 | NA |
2022-09-21 07:16:03 | 07:16 |
2022-09-21 07:16:03 | NA |
2022-09-21 09:19:03 | 09:19 |
I've explored multiple options, but so far I've been struggling to find a solution.
CodePudding user response:
Try something like this using dplyr
and lubridate
. To use a variable from the Global Environment "start_1" and "start_2" try .env
or !!
library(dplyr)
library(lubridate)
start_1 <- hms("07:15:00")
start_2 <- hms("09:15:00")
df %>%
mutate(time = hms(sapply(strsplit(Datetime, " "), "[", 2)),
d1 = time > .env$start_1,
d2 = time > .env$start_2,
start_datetime =
case_when(lag(d1) != d1 ~ paste0(hour(time), ":", minute(time)),
lag(d2) != d2 ~ paste0(hour(time), ":", minute(time)))) %>%
select(Datetime, start_datetime)
Datetime start_datetime
1 2022-09-21 07:02:03 <NA>
2 2022-09-21 07:02:03 <NA>
3 2022-09-21 07:14:03 <NA>
4 2022-09-21 07:16:03 7:16
5 2022-09-21 07:16:03 <NA>
6 2022-09-21 09:19:03 9:19
Data
df <- structure(list(Datetime = c("2022-09-21 07:02:03", "2022-09-21 07:02:03",
"2022-09-21 07:14:03", "2022-09-21 07:16:03", "2022-09-21 07:16:03",
"2022-09-21 09:19:03")), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
I'm assuming that this is a "per day" thing, meaning that your start_1
and start_2
just reflect time-of-day (which means strptime
is not doing what you want).
If your Datetime
column is class character
, then we can make some assumptions about lexicographic sorting and do something like this:
start_1 <- "07:15:00"; start_2 <- "09:15:00"
library(dplyr)
# library(purrr) # map_dfc
quux %>%
mutate(
Date = substring(Datetime, 1, 10),
purrr::map_dfc(list(s1 = start_1, s2 = start_2),
~ if_else(substring(Datetime, nchar(Datetime) - 7, nchar(Datetime)) > .,
., .[NA]))
) %>%
group_by(Date) %>%
mutate(
across(c(s1, s2), ~ replace(., duplicated(.), NA)),
start_datetime = coalesce(s1, s2)
) %>%
ungroup() %>%
select(-Date, -s1, -s2)
# # A tibble: 6 x 2
# Datetime start_datetime
# <chr> <chr>
# 1 2022-09-21 07:02:03 NA
# 2 2022-09-21 07:02:03 NA
# 3 2022-09-21 07:14:03 NA
# 4 2022-09-21 07:16:03 07:15:00
# 5 2022-09-21 07:16:03 NA
# 6 2022-09-21 09:19:03 09:15:00
In fact, we can generalize this to any number of starts
.
fun <- function(time, starts) {
tmp <- which(
apply(outer(time, starts, `>`), 2, function(z) replace(z, c(FALSE, z[-length(z)]), FALSE)),
arr.ind = TRUE)
replace(time[NA], tmp[,"row"], starts[tmp[,"col"]])
}
starts <- c("07:15:00", "09:15:00")
quux %>%
mutate(
Date = substring(Datetime, 1, 10),
Time = substring(Datetime, 12, 19)
) %>%
group_by(Date) %>%
mutate(start_datetime = fun(Time, starts)) %>%
ungroup() %>%
select(-Date, -Time)
# # A tibble: 6 x 2
# Datetime start_datetime
# <chr> <chr>
# 1 2022-09-21 07:02:03 NA
# 2 2022-09-21 07:02:03 NA
# 3 2022-09-21 07:14:03 NA
# 4 2022-09-21 07:16:03 07:15:00
# 5 2022-09-21 07:16:03 NA
# 6 2022-09-21 09:19:03 09:15:00