How to copy a value from one column to a new column if column A

I am looking to create a new column called true_water_on in the dataframe trial_A.

  study_ID       randomisation            water_on           water_off
1        5 2021-01-01 11:00:00 2021-01-01 13:00:00 2021-01-01 18:00:00
2        6 2021-01-02 10:00:00 2021-01-02 09:00:00 2021-01-02 18:00:00
3        7 2021-01-03 10:00:00                <NA>                <NA>
4        8 2021-01-04 10:00:00 2021-01-04 09:45:00 2021-01-04 11:00:00

The conditions to populate it are as follows

If "water_on" date and time precedes the time and date in "randomisation" then the randomisation data and time for that row is copied across into "true_water_on". This occurs for study_ID 6 and 8; as demonstrated by

trial_A %>% mutate(TD_ran_waterstart = water_on - randomisation, units="mins")

If "water_on" date and time occurs after the time and date in "randomisation" then the water_on data for that row is copied across into "water_drug_on"
If there is no data and time recorded in "water_on" then NA is marked in "true_water_on"

Data

trial_A <- 
  data.frame(study_ID=c(5, 6, 7, 8), 
             randomisation=as.POSIXlt(c("2021-01-01 11:00", "2021-01-02 10:00", 
                                        "2021-01-03 10:00", "2021-01-04 10:00")), 
             water_on=as.POSIXlt(c("2021-01-01 13:00", "2021-01-02 09:00", NA, 
                                   "2021-01-04 09:45")), 
             water_off=as.POSIXlt(c("2021-01-01 18:00", "2021-01-02 18:00", NA, 
                                    "2021-01-04 11:00")))

CodePudding user response：

As @IRTFM suggests this looks like a simple application of ifelse or case_when in dplyr.

library(dplyr)

trial_A %>%
  mutate(true_water_on = case_when(water_on < randomisation ~ randomisation, 
                                   water_on > randomisation ~ water_on))

#  study_ID       randomisation            water_on           water_off       true_water_on
#1        5 2021-01-01 11:00:00 2021-01-01 13:00:00 2021-01-01 18:00:00 2021-01-01 13:00:00
#2        6 2021-01-02 10:00:00 2021-01-02 09:00:00 2021-01-02 18:00:00 2021-01-02 10:00:00
#3        7 2021-01-03 10:00:00                <NA>                <NA>                <NA>
#4        8 2021-01-04 10:00:00 2021-01-04 09:45:00 2021-01-04 11:00:00 2021-01-04 10:00:00

In case_when if none of the conditions match it returns NA by default.

CodePudding user response：

You could simply use max.

transform(trial_A, true_water_on=apply(trial_A[2:3], 1, max))
#  study_ID       randomisation            water_on           water_off       true_water_on
# 1        5 2021-01-01 11:00:00 2021-01-01 13:00:00 2021-01-01 18:00:00 2021-01-01 13:00:00
# 2        6 2021-01-02 10:00:00 2021-01-02 09:00:00 2021-01-02 18:00:00 2021-01-02 10:00:00
# 3        7 2021-01-03 10:00:00                <NA>                <NA>                <NA>
# 4        8 2021-01-04 10:00:00 2021-01-04 09:45:00 2021-01-04 11:00:00 2021-01-04 10:00:00