I am trying to figure how I am going to filter the wrong entries or calculate the difference between two Date columns of the same data frame in R. The scenario is: I have Patient table and there are two columns of Patient_admit and Patient discharge. How I am going to find if the date entered for Patient_discharge is before the Patient_admit. Below is the dataframe example in which the entry of patient 2 and 6 is wrong.
executed
dput(head(patient)
structure(list(id = c(1003L, 1005L, 1006L, 1007L, 1010L, 1010L
), date_admit = structure(c(115L, 18L, 138L,
91L, 34L, 278L), .Label = c("01/01/2020", "01/02/2020", "01/03/2020",............,
date_discharge = structure(c(143L, 130L, 181L, 156L, 198L,
86L), .Label = c("01/01/2020", "01/01/2021", "01/02/2020",
............., class = "factor")), row.names = c(NA, 6L), class = "data.frame")
The list of date is very long so I just put "..........." for ease of understanding. Thanks
CodePudding user response:
Another possible solution, based on lubridate::dmy
:
library(dplyr)
library(lubridate)
df %>%
filter(dmy(Patient_admit) <= dmy(Patient_discharge))
#> Patient_ID Patient_admit Patient_discharge
#> 1 1 20/10/2020 21/10/2020
#> 2 3 21/10/2021 22/10/2021
#> 3 4 25/11/2022 25/11/2022
#> 4 5 25/11/2022 26/11/2022
CodePudding user response:
First convert your dates to the right format using strptime
. Calculate the difference in days using difftime
and filter
if the days are negative. You can use the following code:
library(dplyr)
df %>%
mutate(Patient_admit = strptime(Patient_admit, "%d/%m/%Y"),
Patient_discharge = strptime(Patient_discharge, "%d/%m/%Y")) %>%
mutate(diff_days = difftime(Patient_discharge, Patient_admit, units = c("days"))) %>%
filter(diff_days >= 0) %>%
select(-diff_days)
Output:
Patient_ID Patient_admit Patient_discharge
1 1 2020-10-20 2020-10-21
2 3 2021-10-21 2021-10-22
3 4 2022-11-25 2022-11-25
4 5 2022-11-25 2022-11-26
Data
df <- data.frame(Patient_ID = c(1,2,3,4,5,6),
Patient_admit = c("20/10/2020", "22/10/2021", "21/10/2021", "25/11/2022", "25/11/2022", "05/10/2020"),
Patient_discharge = c("21/10/2020", "20/10/2021", "22/10/2021", "25/11/2022", "26/11/2022", "20/09/2020"))