I am looking to create a new column called true_water_on in the dataframe trial_A.
study_ID randomisation water_on water_off
1 5 2021-01-01 11:00:00 2021-01-01 13:00:00 2021-01-01 18:00:00
2 6 2021-01-02 10:00:00 2021-01-02 09:00:00 2021-01-02 18:00:00
3 7 2021-01-03 10:00:00 <NA> <NA>
4 8 2021-01-04 10:00:00 2021-01-04 09:45:00 2021-01-04 11:00:00
The conditions to populate it are as follows
- If "water_on" date and time precedes the time and date in "randomisation" then the randomisation data and time for that row is copied across into "true_water_on". This occurs for study_ID 6 and 8; as demonstrated by
trial_A %>% mutate(TD_ran_waterstart = water_on - randomisation, units="mins")
If "water_on" date and time occurs after the time and date in "randomisation" then the water_on data for that row is copied across into "water_drug_on"
If there is no data and time recorded in "water_on" then NA is marked in "true_water_on"
Data
trial_A <-
data.frame(study_ID=c(5, 6, 7, 8),
randomisation=as.POSIXlt(c("2021-01-01 11:00", "2021-01-02 10:00",
"2021-01-03 10:00", "2021-01-04 10:00")),
water_on=as.POSIXlt(c("2021-01-01 13:00", "2021-01-02 09:00", NA,
"2021-01-04 09:45")),
water_off=as.POSIXlt(c("2021-01-01 18:00", "2021-01-02 18:00", NA,
"2021-01-04 11:00")))
CodePudding user response:
As @IRTFM suggests this looks like a simple application of ifelse
or case_when
in dplyr
.
library(dplyr)
trial_A %>%
mutate(true_water_on = case_when(water_on < randomisation ~ randomisation,
water_on > randomisation ~ water_on))
# study_ID randomisation water_on water_off true_water_on
#1 5 2021-01-01 11:00:00 2021-01-01 13:00:00 2021-01-01 18:00:00 2021-01-01 13:00:00
#2 6 2021-01-02 10:00:00 2021-01-02 09:00:00 2021-01-02 18:00:00 2021-01-02 10:00:00
#3 7 2021-01-03 10:00:00 <NA> <NA> <NA>
#4 8 2021-01-04 10:00:00 2021-01-04 09:45:00 2021-01-04 11:00:00 2021-01-04 10:00:00
In case_when
if none of the conditions match it returns NA
by default.
CodePudding user response:
You could simply use max
.
transform(trial_A, true_water_on=apply(trial_A[2:3], 1, max))
# study_ID randomisation water_on water_off true_water_on
# 1 5 2021-01-01 11:00:00 2021-01-01 13:00:00 2021-01-01 18:00:00 2021-01-01 13:00:00
# 2 6 2021-01-02 10:00:00 2021-01-02 09:00:00 2021-01-02 18:00:00 2021-01-02 10:00:00
# 3 7 2021-01-03 10:00:00 <NA> <NA> <NA>
# 4 8 2021-01-04 10:00:00 2021-01-04 09:45:00 2021-01-04 11:00:00 2021-01-04 10:00:00