I am new to R and have two very large datasets I want to merge. They look as follows
ID year val1 val3
1 1 2001 2 34
2 2 2004 1 25
3 3 2003 3 36
4 4 2003 2 46
5 5 1999 1 55
6 6 2005 3 44
The second dataframe is as follows
ID year val2
1 1 2001 2
2 2 2004 1
3 3 2003 3
4 4 2002 5
5 5 1998 4
6 6 2004 6
I want the final merged set to look like this
ID year val1 val3 val2
1 1 2001 2 34 2
2 2 2004 1 25 1
3 3 2003 3 36 3
4 4 2002 NA NA 5
5 4 2003 2 46 NA
6 5 1998 NA NA 4
7 5 1999 1 55 NA
8 6 2004 NA NA 6
9 6 2005 3 44 NA
I tried merging by ID and year using the following
total <- merge(df1,df2,by=c("id","year"))
But this results in only merging if ID and year BOTH match. I want to it to happen so that if the ID matches but year doesn't match, a new row will add in the same ID the entry for year and val2 while leaving val1 and val3 as NA.
I then tried merging only by ID and then removing rows if year.x != year.y, but since the datasets were too large it wasn't very efficient.
CodePudding user response:
merge
has an argument all
that specifies if you want to keep all rows from left and right side (i.e. all rows from x and all rows from y)
total <- merge(df1,df2,by=c("id","year"), all=TRUE)
CodePudding user response:
You must specify all.x=TRUE and all.y=TRUE so you keep all unique rows from both datasets
total <- merge(df1,df2,by=c("id","year"),all.x=TRUE,all.y=TRUE)