Home > Blockchain >  Calculate or Filter the wrong date entries of two Date columns in R
Calculate or Filter the wrong date entries of two Date columns in R

Time:06-25

I am trying to figure how I am going to filter the wrong entries or calculate the difference between two Date columns of the same data frame in R. The scenario is: I have Patient table and there are two columns of Patient_admit and Patient discharge. How I am going to find if the date entered for Patient_discharge is before the Patient_admit. Below is the dataframe example in which the entry of patient 2 and 6 is wrong.

Patient Dataframe

executed

 dput(head(patient)
    structure(list(id = c(1003L, 1005L, 1006L, 1007L, 1010L, 1010L
    ),  date_admit = structure(c(115L, 18L, 138L, 
    91L, 34L, 278L), .Label = c("01/01/2020", "01/02/2020", "01/03/2020",............,
 date_discharge = structure(c(143L, 130L, 181L, 156L, 198L, 
        86L), .Label = c("01/01/2020", "01/01/2021", "01/02/2020", 
       ............., class = "factor")), row.names = c(NA, 6L), class = "data.frame")

The list of date is very long so I just put "..........." for ease of understanding. Thanks

CodePudding user response:

Another possible solution, based on lubridate::dmy:

library(dplyr)
library(lubridate)

df %>% 
  filter(dmy(Patient_admit) <= dmy(Patient_discharge))

#>   Patient_ID Patient_admit Patient_discharge
#> 1          1    20/10/2020        21/10/2020
#> 2          3    21/10/2021        22/10/2021
#> 3          4    25/11/2022        25/11/2022
#> 4          5    25/11/2022        26/11/2022

CodePudding user response:

First convert your dates to the right format using strptime. Calculate the difference in days using difftime and filter if the days are negative. You can use the following code:

library(dplyr)
df %>%
  mutate(Patient_admit = strptime(Patient_admit, "%d/%m/%Y"),
         Patient_discharge = strptime(Patient_discharge, "%d/%m/%Y")) %>%
  mutate(diff_days = difftime(Patient_discharge, Patient_admit, units = c("days"))) %>%
  filter(diff_days >= 0) %>%
  select(-diff_days)

Output:

  Patient_ID Patient_admit Patient_discharge
1          1    2020-10-20        2020-10-21
2          3    2021-10-21        2021-10-22
3          4    2022-11-25        2022-11-25
4          5    2022-11-25        2022-11-26

Data

df <- data.frame(Patient_ID = c(1,2,3,4,5,6),
                 Patient_admit = c("20/10/2020", "22/10/2021", "21/10/2021", "25/11/2022", "25/11/2022", "05/10/2020"),
                 Patient_discharge = c("21/10/2020", "20/10/2021", "22/10/2021", "25/11/2022", "26/11/2022", "20/09/2020"))
  • Related