I have two tables, which I would like to join by date. If an instance falls within a color period, I would like that instance to receive said color as a variable (see joined):
instance <- read.table(header=TRUE, text="
ID UTC_DateTime Value
1 2022-06-15 15:00:00 8
2 2022-06-15 15:05:00 3
3 2022-06-15 15:10:00 2
4 2022-06-15 15:15:00 4
")
periods <- read.table(header=TRUE, text="
COLOR UTC_DateTime
Start_Green 2022-06-15 14:56:22
End_Green 2022-06-15 15:03:08
Start_Red 2022-06-15 15:03:11
End_Red 2022-06-15 15:58:48
")
joined <- read.table(header=TRUE, text="
ID UTC_DateTime Value Color
1 2022-06-15 15:00:00 8 Green
2 2022-06-15 15:05:00 3 Green
3 2022-06-15 15:10:00 2 Red
4 2022-06-15 15:15:00 4 Red
")
Also, if you could somehow incorporate a process which converts periods
to periods_wide
that would be great, see:
periods_wide <- read.table(header=TRUE, text="
COLOR UTC_DateTime_Start UTC_DateTime_End
Green 2022-06-15 14:56:22 2022-06-15 15:03:08
Red 2022-06-15 15:03:11 2022-06-15 15:58:48
")
Please be aware that there might be many more instances and also many more periods (of the same or other colors)
CodePudding user response:
A possible solution, based on powerjoin::power_inner_join
and lubridate
. Notice that read.table
was not reading the data as intended and, therefore, I used read.csv
:
library(tidyverse)
library(powerjoin)
library(lubridate)
instance <- read.csv(header=TRUE, text="
ID, UTC_DateTime, Value
1, 2022-06-15 15:00:00, 8
2, 2022-06-15 15:05:00, 3
3, 2022-06-15 15:10:00, 2
4, 2022-06-15 15:15:00, 4
", strip.white = T)
periods <- read.csv(header=TRUE, text="
COLOR, UTC_DateTime
Start_Green, 2022-06-15 14:56:22
End_Green, 2022-06-15 15:03:08
Start_Red, 2022-06-15 15:03:11
End_Red, 2022-06-15 15:58:48
", strip.white = T)
periods %>%
mutate(COLOR = str_remove(COLOR, "Start_|End_"),
name = rep(c("d1", "d2"), 2)) %>%
pivot_wider(COLOR, values_from = UTC_DateTime) %>%
power_inner_join(instance,.,
by = c(~ ymd_hms(.x$UTC_DateTime) <= ymd_hms(.y$d2),
~ ymd_hms(.x$UTC_DateTime) >= ymd_hms(.y$d1))) %>%
select(-(d1:d2))
#> ID UTC_DateTime Value COLOR
#> 1 1 2022-06-15 15:00:00 8 Green
#> 2 2 2022-06-15 15:05:00 3 Red
#> 3 3 2022-06-15 15:10:00 2 Red
#> 4 4 2022-06-15 15:15:00 4 Red