My df looks like this:
SERIAL quest time_d1_1 time_d1_2 time_d2_1 time_d2_2 STARTED V01
F3L d1_1 05:00 17:30 05:15 17:45 2022-01-08 05:06:19 5
F3L d1_2 05:00 17:30 05:15 17:45 2022-01-08 17:30:07 2
F3L d2_1 05:00 17:30 05:15 17:45 2022-01-08 8:36:54 1
F3L d2_2 05:00 17:30 05:15 17:45 2022-01-08 18:10:07 7
7HG d1_1 05:00 17:30 05:15 17:45 2022-01-08 05:33:15 4
7HG d1_2 05:00 17:30 05:15 17:45 2022-01-08 18:49:22 2
7HG d2_1 05:00 17:30 05:15 17:45 2022-01-08 07:33:15 2
7HG d2_2 05:00 17:30 05:15 17:45 2022-01-08 18:29:22 6
SERIAL
= Identifierquest
= [day X] _ [measurement]; "d1_1" = day one, measurement onetime_d1_1
= the reference time (hh:mm) for day one, measurement 1- ...
time_d2_2
= the reference time (hh:mm) for day two, measurement 2STARTED
= the day & time (yyyy:mm:dd hh:mm:ss) when each measurement was startedV01
= some values belonging to each quest
For each row, I would like to set the variable V01
to NA
, when the variable STARTED
is more than an hour later than the reference time (time_d1_1
to time_d2_2
), in relation to the quest
variable.
Example: In row 3, SERIAL=F3L
started day two, measurement one (quest=d2_1
) at 8:36:54
. However, the reference time (time_d2_1
) is 05:15
. I would now set V01=1
to NA, as 8:36:54
is larger than 05:15 hour(1)
.
Unfortunately, I have this weird format so I struggle to solve this with the mutate-ifelse() or mutate-case_when() functions. Can anyone help, preferably a tidyverse solution?
The data:
Dat <- structure(list(SERIAL = c("F3L","F3L","F3L","F3L","7HG","7HG","7HG","7HG"),
quest = c("d1_1","d1_2","d2_1","d2_2","d1_1","d1_2","d2_1","d2_2"),
time_d1_1 = c("05:00","05:00","05:00","05:00","05:30","05:30","05:30","05:30"),
time_d1_2 = c("17:30","17:30","17:30","17:30","18:10","18:10","18:10","18:10"),
time_d2_1 = c("05:15","05:15","05:15","05:15","05:30","05:30","05:30","05:30"),
time_d2_2 = c("17:45","17:45","17:45","17:45","18:00","18:00","18:00","18:00"),
STARTED = c("2022-01-08 05:06:19","2022-01-08 17:30:07","2022-01-09 8:36:54",
"2022-01-09 18:10:07","2021-09-04 05:33:15","2021-09-04 18:49:22",
"2021-09-05 07:33:15","2021-09-05 18:29:22"),
V01 = c(5,3,1,7,4,2,2,6)),
class = "data.frame",
row.names = c(NA, -8L))
CodePudding user response:
We can turn the table into long form with tidyr::pivot_longer
, do the calculation, and turn into wide form (with tidyr::pivot_wider
) again. See inline comments in example. I used lubridate
to parse the datetime object; this could probably also be done using base R.
library(dplyr)
library(tidyr)
library(lubridate)
Dat %>%
pivot_longer(starts_with("time_"), names_prefix = "time_") %>% # turn into long form
filter(quest == name) %>% # keep only the record where the `time_dX_Y` column matches `quest`
mutate(in_time = value > strftime(as_datetime(STARTED) - hours(1), "%H:%M:%S", tz = "UTC"),) %>% # calculate whether was in time
pivot_wider(names_from = name, names_prefix = "time_", values_from = value) %>% # turn into wide form again
group_by(SERIAL) %>%
mutate(across(starts_with("time_"), function(x) first(x[!is.na(x)]))) # fill missings in `time_dX_Y` columns caused by the filter above
Which gives
SERIAL quest STARTED V01 in_time time_d1_1 time_d1_2 time_d2_1 time_d2_2
<chr> <chr> <chr> <dbl> <lgl> <chr> <chr> <chr> <chr>
1 F3L d1_1 2022-01-08 05:06:19 5 TRUE 05:00 17:30 05:15 17:45
2 F3L d1_2 2022-01-08 17:30:07 3 TRUE 05:00 17:30 05:15 17:45
3 F3L d2_1 2022-01-09 8:36:54 1 FALSE 05:00 17:30 05:15 17:45
4 F3L d2_2 2022-01-09 18:10:07 7 TRUE 05:00 17:30 05:15 17:45
5 7HG d1_1 2021-09-04 05:33:15 4 TRUE 05:30 18:10 05:30 18:00
6 7HG d1_2 2021-09-04 18:49:22 2 TRUE 05:30 18:10 05:30 18:00
7 7HG d2_1 2021-09-05 07:33:15 2 FALSE 05:30 18:10 05:30 18:00
8 7HG d2_2 2021-09-05 18:29:22 6 TRUE 05:30 18:10 05:30 18:00