Home > database >  Merging two dataframes by two columns resulting in blank df
Merging two dataframes by two columns resulting in blank df

Time:03-11

I have one dataframe (qtpo_liver_dates) of liver transplant patients and their procedure dates. In this dataframe, each patient has their own procedure date. Another dataframe (labs_v500) of the same cohort shows lab dates and results, and visit dates including the procedure date, but mostly dates not of interest. I want to merge the dataframes on the patient and procedure date in R (i.e. filter labs_v500 by the actual procedure date). I tried the following but returned a blank dataframe

df <- merge(qtpo_liver_dates,labs_v500, by = c("patient_num","liver_date"))

Could someone explain the best way to do this?

enter image description here

enter image description here

CodePudding user response:

You should first tell R that both liver_date columns are dates. The function as.Date let you do that.

So let`s say we got df1 and df2

date1<-(c("2007-08-01", "2004-10-05", "2014-03-09"))#Year - Month - Day
date2<-(c("8/1/07", "10/5/04", "3/9/14"))#Month/Day/Year 
x<-(c(1:3))
z<-c(11:13)
w<-c(11:13)


df1<-data.frame(date1, x, z)
str(df1$data1)
  
df1

> df1
       date1 x  z
1 2007-08-01 1 11
2 2004-10-05 2 12
3 2014-03-09 3 13

df2<-data.frame(date2, x, w)
str(df2$date2)

df2 
> df2
    date2 x  w
1  8/1/07 1 11
2 10/5/04 2 12
3  3/9/14 3 13

With as.Date you tell the format of the column in which the dates are, for df1 is a Y-M-D

df1$date1<-as.Date.character(df1$date1,format="%Y-%m-%d")
str(df1$date1)

And for df2 is m/d/y

df2$date1<-as.Date.character(df2$date2,format="%m/%d/%y")
str(df2$date1)

We recode the df2$date2 in df2$date1 to match the name of the column, this will be needed by the merge function later, in your case you could recode in the same column because they have the same name:

df3<-merge(df1,df2, by =c("date1", "x" ) )
df3

>df3
       date1 x  z   date2  w
1 2004-10-05 2 12 10/5/04 12
2 2007-08-01 1 11  8/1/07 11
3 2014-03-09 3 13  3/9/14 13

As you can see, z and w match perfectly, so we know we did it right.

In your data:

df1 = qtpo_liver_dates

df2 = labs_v500

date1, date2 = liver_date

x = patient_num

z = Some column in qtpo_liver_dates

w = Some column in labs_v500

CodePudding user response:

It's best when you are posting questions here to include some sample data in a way that's easy to work with. Rather than a picture you can do dput(head(my_data_frame) to get a sampling of your data that you can post in your question. Since you didn't do that I created a small subset of your data to illustrate a solution.

Here's the same data I created to simulate your problem:

qtpo_liver_dates <- data.frame(
  patient_num = c(1, 2, 3),
  liver_date = c("2007-08-01", "2004-10-05", "2014-03-09")
)

labs_v500 <- data.frame(
  patient_num = c(1, 2, 3),
  liver_date = c("8/1/2007", "10/5/2004", "3/9/2014"),
  other_data = c("Other Data A", "Other Data B", "Other Data C")
)

As @jdobres mentioned, merge() didn't work because the format of liver_date varies in the two data sets. This is easily corrected, however, with functions from the lubridate package.

We'll correct the date formats with two calls to lubridate functions. In the first we'll use mdy() which expects data in a month-day-year format. In the second we'll use ymd() which looks for data in a year-month-day format. In both cases we'll replace the character strings in the liver_date variables of your data frames with the new date variables.

library(lubridate)

qtpo_liver_dates$liver_date <- ymd(qtpo_liver_dates$liver_date)
labs_v500$liver_date <- mdy(labs_v500$liver_date)

Now merge() should work, and since the common variable names are the same between the two data frames there's no need to specify them explicitly.

merge(qtpo_liver_dates, labs_v500)

Output:

  patient_num liver_date   other_data
1           1 2007-08-01 Other Data A
2           2 2004-10-05 Other Data B
3           3 2014-03-09 Other Data C
  • Related