How to calculate mean value in R between initial timepoint and 24 hours later-CodePudding

I have a healthcare dataset in which I have to calculate the average score of a value within the first 24 hours of an ICU admission and I am playing around with dplyr and the filter function and the between statement but I keep getting 0 observations.

For example I have this dataset with reproducible data including converting the date time to POSIXct because that's what I have in my original dataset:

test<-structure(list(MRN=structure(c(001, 001, 001, 001, 001, 002, 002, 002), 
                   label="MRN"),
                   Encounter_ID=structure(c(001, 001, 001, 001, 001, 100, 100, 100), label="Encounter_ID"),
                   ICU_FIRST=structure(c("2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-03-04 05:00:00", "2018-03-04 05:00:00", "2018-03-04 05:00:00"), label="ICU_FIRST"),
                   Record_Time=structure(c("2018-01-03 20:00:00", "2018-01-04 00:10:00", "2018-01-04 09:20:00", "2018-01-04 13:00:00", "2018-01-06 13:00:00", "2018-03-04 13:00:00", "2018-03-04 17:00:00", "2018-03-05 13:00:00")),
                   Value=structure(c(5, 5, 3, 9, 4, 7, 10, 20))),
                row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))

test$ICU_FIRST<-as.POSIXlt(test[["ICU_FIRST"]],"%m-%d-%Y %H:%M")
test$Record_Time<-as.POSIXlt(test[["Record_Time"]],"%m-%d-%Y %H:%M")

^I want to calculate the mean value for each encounter ID whose Record_Time is >=ICU_FIRST and <=ICU_FIRST 24 hours. In such an instance, for Encounter_ID 001, I would want rows 1-4 but not row 5 counted in calculating the mean value. Likewise for Encounter ID 002, I would want rows 6 and 7 included in calculating the mean value, but not row 8. I tried playing around with this code in dplyr:

df %>%
  arrange(MRN, Encounter_ID, ICU_FIRST, Record_Time)%>%
  group_by(Encounter_ID, ICU_FIRST, Record_Time)%>%
  filter(between(Record_Time, ICU_FIRST, (ICU_FIRST 86400))) %>%
  summarise(mean_value= mean(value))

Any suggestions for how I can troubleshoot this code would be much appreciated, thank you! I'm also open to solutions using other packages aside from dplyr.

CodePudding user response：

A very simple base-r solution uses aggregate

aggregate(Value ~ Encounter_ID, 
    data = test[test$Record_Time >= test$ICU_FIRST & test$Record_Time <= (test$ICU_FIRST   60 * 60 * 24),], 
    FUN = mean
)

The first argument is your formula, in the format numeric ~ grouping. The data argument is where you apply your 24h (in this case, 60s * 60m * 24h) time window filtering. FUN specifies that you want to apply the mean function to the rows that you have not filtered out.

This can be done with pipes and dplyr as well if you prefer that syntax.

CodePudding user response：

Here a tidyverse solution.

test<-structure(list(MRN=structure(c(001, 001, 001, 001, 001, 002, 002, 002), 
                   label="MRN"),
                   Encounter_ID=structure(c(001, 001, 001, 001, 001, 100, 100, 100), label="Encounter_ID"),
                   ICU_FIRST=structure(c("2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-01-03 19:37:00", "2018-03-04 05:00:00", "2018-03-04 05:00:00", "2018-03-04 05:00:00"), label="ICU_FIRST"),
                   Record_Time=structure(c("2018-01-03 20:00:00", "2018-01-04 00:10:00", "2018-01-04 09:20:00", "2018-01-04 13:00:00", "2018-01-06 13:00:00", "2018-03-04 13:00:00", "2018-03-04 17:00:00", "2018-03-05 13:00:00")),
                   Value=structure(c(5, 5, 3, 9, 4, 7, 10, 20))),
                row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))

# test$ICU_FIRST<-as.POSIXlt(test[["ICU_FIRST"]],"%m-%d-%Y %H:%M", tz="")
# test$Record_Time<-as.POSIXlt(test[["Record_Time"]],"%m-%d-%Y %H:%M")

test
#> # A tibble: 8 × 5
#>     MRN Encounter_ID ICU_FIRST           Record_Time         Value
#>   <dbl>        <dbl> <chr>               <chr>               <dbl>
#> 1     1            1 2018-01-03 19:37:00 2018-01-03 20:00:00     5
#> 2     1            1 2018-01-03 19:37:00 2018-01-04 00:10:00     5
#> 3     1            1 2018-01-03 19:37:00 2018-01-04 09:20:00     3
#> 4     1            1 2018-01-03 19:37:00 2018-01-04 13:00:00     9
#> 5     1            1 2018-01-03 19:37:00 2018-01-06 13:00:00     4
#> 6     2          100 2018-03-04 05:00:00 2018-03-04 13:00:00     7
#> 7     2          100 2018-03-04 05:00:00 2018-03-04 17:00:00    10
#> 8     2          100 2018-03-04 05:00:00 2018-03-05 13:00:00    20


library(tidyverse)

test %>%
mutate(ICU_FIRST=lubridate::ymd_hms(ICU_FIRST)) %>%
mutate(icu_24=ICU_FIRST lubridate::hours(24)) %>%
filter(Record_Time>=ICU_FIRST)  %>%
filter(Record_Time<=icu_24) %>%
group_by(Encounter_ID) %>%
summarise(mean_value=mean(Value))
#> # A tibble: 2 × 2
#>   Encounter_ID mean_value
#>          <dbl>      <dbl>
#> 1            1       5.67
#> 2          100       8.5

^{Created on 2022-10-14 with reprex v2.0.2}