Delete Row based on Proximity to Date-CodePudding

I have a dataset consisting of patients and their encounters in hospital, each observation is an individual patient encounter, one of the columns is the patient ID, see below,

Some of the encounters are an operation date, I'm trying to get a list of encounters within 30 days after the operation date and am struggling, (using R),

See below an example of the data:

Patient ID	Encounter ID	Operation Date	Encounter Date
34	12	2022-01-01	2022-01-01
34	21	NA	2022-01-20
35	15	NA	2022-03-02
35	16	2022-04-01	2022-04-01
35	18	NA	2022-06-25

So for example, I would like to get Encounter 21, but not encounter 18,

If anyone could help me out that'd be much appreciated,

As a side note if anyone knows any packages which are good for dealing with this type of data, that would be really useful, the 'time series' packages I've tried don't seem to help for this type of problem.

CodePudding user response：

library(lubridate)
library(tidyverse)

df <- read_table("Patient_ID    Encounter_ID    Operation_Date  Encounter_Date
34  12  2022-01-01  2022-01-01
34  21  NA  2022-01-20
35  15  NA  2022-03-02
35  16  2022-04-01  2022-04-01
35  18  NA  2022-06-25")

df %>%  
  group_by(Patient_ID) %>%  
  filter(Encounter_Date <= first(Encounter_Date) %m % days(30)) %>%
  ungroup()

#> # A tibble: 4 x 4
#>   Patient_ID Encounter_ID Operation_Date Encounter_Date
#>        <dbl>        <dbl> <date>         <date>        
#> 1         34           12 2022-01-01     2022-01-01    
#> 2         34           21 NA             2022-01-20    
#> 3         35           15 NA             2022-03-02    
#> 4         35           16 2022-04-01     2022-04-01

^{Created on 2022-08-14 by the reprex package (v2.0.1)}

CodePudding user response：

How about using two filters?

library(dplyr)

df |>
  group_by(PatientID) |>
  filter(is.na(OperationDate),
         between(as.numeric(EncounterDate-min(OperationDate, na.rm = TRUE)), 0, 30)) |>
  ungroup()

Output:

# A tibble: 1 × 4
  PatientID EncounterID OperationDate EncounterDate
      <dbl>       <dbl> <date>        <date>       
1        34          21 NA            2022-01-20

Why it works:

We group by each patient with group_by
is.na(OperationDate) secures that we don't include the encounter of the operation.
between makes sure that we do not include encounters before the operation. E.g. encounter 15.
min selects the first non-NA date (in each group).

Data:

library(readr)

df <- read_table("PatientID EncounterID OperationDate   EncounterDate
34  12  2022-01-01  2022-01-01
34  21  NA  2022-01-20
35  15  NA  2022-03-02
35  16  2022-04-01  2022-04-01
35  18  NA  2022-06-25")