I have longitudinal patient data in R. I would like to create an aggregate table like table 2 below from table 1. so Table 2 would only have one row for each patient and have total counts of consultations before the registration date (column 3 in table 1) and total consultations after the registration date
Table1:
patid | consultation_date | registration_date | consultation_count |
---|---|---|---|
1 | 07/07/2016 | 07/07/2018 | 1 |
1 | 07/07/2019 | 07/07/2018 | 1 |
1 | 07/07/2020 | 07/07/2018 | 1 |
2 | 14/08/2016 | 07/09/2016 | 1 |
2 | 07/05/2015 | 07/09/2016 | 1 |
2 | 02/12/2016 | 07/09/2016 | 1 |
Table 2:
patid | consultation_count_pre_registration | consultation_count_post_registration |
---|---|---|
1 | 1 | 2 |
2 | 2 | 1 |
CodePudding user response:
Similar to akrun in using tidyverse
but slightly different approach:
library(dplyr)
library(tidyr)
consultations |>
mutate(period = ifelse(
registration_date <= consultation_date,
"after registration",
"before registration"
)
) |>
group_by(patid, period) |>
summarise(n = n()) |>
pivot_wider(
names_from = period,
values_from = n
)
# A tibble: 2 x 3
# Groups: patid [2]
# patid `after registration` `before registration`
# <int> <int> <int>
# 1 1 2 1
# 2 2 1 2
Data
consultations <- read.table(text = "patid consultation_date registration_date consultation_count
1 07/07/2016 07/07/2018 1
1 07/07/2019 07/07/2018 1
1 07/07/2020 07/07/2018 1
2 14/08/2016 07/09/2016 1
2 07/05/2015 07/09/2016 1
2 02/12/2016 07/09/2016 1", h=T)
CodePudding user response:
We could convert the 'date' to Date
class, then group by 'patid', get the sum
of logical vector from the 'consultation_date' and 'registration_date'
library(dplyr)
library(lubridate)
df1 %>%
mutate(across(ends_with('date'), dmy)) %>%
group_by(patid) %>%
summarise(
count_pre = sum(consultation_date < registration_date, na.rm = TRUE),
count_post = sum(consultation_date > registration_date, na.rm = TRUE),
.groups = 'drop')
-output
# A tibble: 2 × 3
patid count_pre count_post
<int> <int> <int>
1 1 1 2
2 2 2 1
data
df1 <- structure(list(patid = c(1L, 1L, 1L, 2L, 2L, 2L),
consultation_date = c("07/07/2016",
"07/07/2019", "07/07/2020", "14/08/2016", "07/05/2015", "02/12/2016"
), registration_date = c("07/07/2018", "07/07/2018", "07/07/2018",
"07/09/2016", "07/09/2016", "07/09/2016"), consultation_count = c(1L,
1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-6L))