Home > OS >  how to create aggregate data from repeated data based on a date in R
how to create aggregate data from repeated data based on a date in R

Time:07-27

I have longitudinal patient data in R. I would like to create an aggregate table like table 2 below from table 1. so Table 2 would only have one row for each patient and have total counts of consultations before the registration date (column 3 in table 1) and total consultations after the registration date

Table1:

patid consultation_date registration_date consultation_count
1 07/07/2016 07/07/2018 1
1 07/07/2019 07/07/2018 1
1 07/07/2020 07/07/2018 1
2 14/08/2016 07/09/2016 1
2 07/05/2015 07/09/2016 1
2 02/12/2016 07/09/2016 1

Table 2:

patid consultation_count_pre_registration consultation_count_post_registration
1 1 2
2 2 1

CodePudding user response:

Similar to akrun in using tidyverse but slightly different approach:

library(dplyr)
library(tidyr)

consultations  |>
    mutate(period = ifelse(
        registration_date <= consultation_date, 
        "after registration",
        "before registration"
    )
    )  |>
    group_by(patid, period)  |>
    summarise(n = n())  |>
    pivot_wider(
        names_from = period, 
        values_from = n
    )

# A tibble: 2 x 3
# Groups:   patid [2]
#   patid `after registration` `before registration`
#   <int>                <int>                 <int>
# 1     1                    2                     1
# 2     2                    1                     2

Data

consultations  <- read.table(text = "patid  consultation_date   registration_date   consultation_count
1   07/07/2016  07/07/2018  1
1   07/07/2019  07/07/2018  1
1   07/07/2020  07/07/2018  1
2   14/08/2016  07/09/2016  1
2   07/05/2015  07/09/2016  1
2   02/12/2016  07/09/2016  1", h=T)

CodePudding user response:

We could convert the 'date' to Date class, then group by 'patid', get the sum of logical vector from the 'consultation_date' and 'registration_date'

library(dplyr)
library(lubridate)
df1 %>%
    mutate(across(ends_with('date'), dmy)) %>%
    group_by(patid) %>%
    summarise(
     count_pre = sum(consultation_date < registration_date, na.rm = TRUE),
      count_post = sum(consultation_date > registration_date, na.rm = TRUE), 
     .groups = 'drop')

-output

# A tibble: 2 × 3
  patid count_pre count_post
  <int>     <int>      <int>
1     1         1          2
2     2         2          1

data

df1 <- structure(list(patid = c(1L, 1L, 1L, 2L, 2L, 2L), 
consultation_date = c("07/07/2016", 
"07/07/2019", "07/07/2020", "14/08/2016", "07/05/2015", "02/12/2016"
), registration_date = c("07/07/2018", "07/07/2018", "07/07/2018", 
"07/09/2016", "07/09/2016", "07/09/2016"), consultation_count = c(1L, 
1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-6L))
  • Related