I have one dataframe (qtpo_liver_dates) of liver transplant patients and their procedure dates. In this dataframe, each patient has their own procedure date. Another dataframe (labs_v500) of the same cohort shows lab dates and results, and visit dates including the procedure date, but mostly dates not of interest. I want to merge the dataframes on the patient and procedure date in R (i.e. filter labs_v500 by the actual procedure date). I tried the following but returned a blank dataframe
df <- merge(qtpo_liver_dates,labs_v500, by = c("patient_num","liver_date"))
Could someone explain the best way to do this?
CodePudding user response:
You should first tell R that both liver_date
columns are dates. The function as.Date
let you do that.
So let`s say we got df1
and df2
date1<-(c("2007-08-01", "2004-10-05", "2014-03-09"))#Year - Month - Day
date2<-(c("8/1/07", "10/5/04", "3/9/14"))#Month/Day/Year
x<-(c(1:3))
z<-c(11:13)
w<-c(11:13)
df1<-data.frame(date1, x, z)
str(df1$data1)
df1
> df1
date1 x z
1 2007-08-01 1 11
2 2004-10-05 2 12
3 2014-03-09 3 13
df2<-data.frame(date2, x, w)
str(df2$date2)
df2
> df2
date2 x w
1 8/1/07 1 11
2 10/5/04 2 12
3 3/9/14 3 13
With as.Date
you tell the format of the column in which the dates are, for df1
is a Y-M-D
df1$date1<-as.Date.character(df1$date1,format="%Y-%m-%d")
str(df1$date1)
And for df2
is m/d/y
df2$date1<-as.Date.character(df2$date2,format="%m/%d/%y")
str(df2$date1)
We recode the df2$date2
in df2$date1
to match the name of the column, this will be needed by the merge
function later, in your case you could recode in the same column because they have the same name:
df3<-merge(df1,df2, by =c("date1", "x" ) )
df3
>df3
date1 x z date2 w
1 2004-10-05 2 12 10/5/04 12
2 2007-08-01 1 11 8/1/07 11
3 2014-03-09 3 13 3/9/14 13
As you can see, z
and w
match perfectly, so we know we did it right.
In your data:
df1
= qtpo_liver_dates
df2
= labs_v500
date1
, date2
= liver_date
x
= patient_num
z
= Some column in qtpo_liver_dates
w
= Some column in labs_v500
CodePudding user response:
It's best when you are posting questions here to include some sample data in a way that's easy to work with. Rather than a picture you can do dput(head(my_data_frame)
to get a sampling of your data that you can post in your question. Since you didn't do that I created a small subset of your data to illustrate a solution.
Here's the same data I created to simulate your problem:
qtpo_liver_dates <- data.frame(
patient_num = c(1, 2, 3),
liver_date = c("2007-08-01", "2004-10-05", "2014-03-09")
)
labs_v500 <- data.frame(
patient_num = c(1, 2, 3),
liver_date = c("8/1/2007", "10/5/2004", "3/9/2014"),
other_data = c("Other Data A", "Other Data B", "Other Data C")
)
As @jdobres mentioned, merge()
didn't work because the format of liver_date
varies in the two data sets. This is easily corrected, however, with functions from the lubridate
package.
We'll correct the date formats with two calls to lubridate
functions. In the first we'll use mdy()
which expects data in a month-day-year format. In the second we'll use ymd()
which looks for data in a year-month-day format. In both cases we'll replace the character strings in the liver_date
variables of your data frames with the new date variables.
library(lubridate)
qtpo_liver_dates$liver_date <- ymd(qtpo_liver_dates$liver_date)
labs_v500$liver_date <- mdy(labs_v500$liver_date)
Now merge()
should work, and since the common variable names are the same between the two data frames there's no need to specify them explicitly.
merge(qtpo_liver_dates, labs_v500)
Output:
patient_num liver_date other_data
1 1 2007-08-01 Other Data A
2 2 2004-10-05 Other Data B
3 3 2014-03-09 Other Data C