I have 2 dataframes: wd_patients
and wd_appointments
.
How do I check if all patients in wd_patients are also listed in wd_appointments and vice versa. And then how can I update these two files such that every patient in one dataset also exists in the other one as well?
Please see the structures of my dataframes below:
str(wd_patients)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 990 obs. of 5 variables:
$ ID : chr "QJSkgvEFvP" "UwSKF6tmaG" "TgO3Lo9cR6" "DFDaFnBxU9" ...
$ Age : num 63 64 60 76 66 68 69 87 74 85 ...
$ Gender : chr "Male" "Female" "Female" "Female" ...
$ AppointmentCount : num 3 3 3 3 5 2 2 2 5 2 ...
$ TreatmentCompleted: chr "Yes" "No" "No" "Yes" ...
str(wd_appointments)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3473 obs. of 3 variables:
$ patientID : chr "IEBQggWteS" "nCOytLO2tK" "Kc7D1GZhqr" "GTyawkFnQn" ...
$ apptDate : POSIXct, format: "2022-06-01" "2022-06-01" "2022-06-01" "2022-06-01" ...
$ apptStatus: chr "No Show" "No Show" "Cancelled" "Cancelled" ...
CodePudding user response:
You can start by finding patient ids that are in wd_patients and not wd_appointments and vice versa...
ids1 <- setdiff(wd_patients$ID, wd_appointments$patientID) # ids in wd_patients but not wd_appointments
ids2 <- setdiff(wd_appointments$patientID, wd_patients$ID) # ids in wd_appointments but not wd_patients
After that you can decide how you want to synchronize the two data.frames
CodePudding user response:
Using the base R you can use 'merge' function and you obtain only those observations which are existing in both datasets.
mergedData <- merge(wd_patients, wd_appointments , by.x=c('ID'), by.y=c('patientID'))
If you don't want join other variables from the second data set, then you can use the following code:
mergedData <- merge(wd_patients, wd_appointments[,c('patientID')] , by.x=c('ID'), by.y=c('patientID'))
CodePudding user response:
You can do a left_join
:
library(dplyr)
left_join(wd_patients, wd_appointments, by = c("ID", "patientID"))
This will join wd_appointments
to wd_patients
where there is a shared ID.
EDIT:
If you want to retain only the rows with matches in both data frames, use inner_join
.