Home > OS >  Is it possible to join data frames using something like full_join in R without duplicating column va
Is it possible to join data frames using something like full_join in R without duplicating column va

Time:11-11

I'm trying to combine to payment streams by date, however when I use full_join it duplicates some payment streams, making analysis impossible.

For Example when I run this code with example data:

library(tidyquant)

v1 = c("a", "a", "b", "b", "c", "c")
v2 = c(1,2,3,4,3,4)

v3 = c("a", "b", "c")
v4 = c(10,20,30)

df1 = data.frame(v1, v2)
df2 = data.frame(v3, v4)

full_join(df1, df2, by = c("v1" = "v3"))

I get this output:

  v1 v2 v4
1  a  1 10
2  a  2 10
3  b  3 20
4  b  4 20
5  c  3 30
6  c  4 30

But I want this:

  v1 v2 v4
1  a  1 10
2  a  2 NA
3  b  3 20
4  b  4 NA
5  c  3 30
6  c  4 NA

Is there any way, using dplyrs join or other functions to get the output I want.

CodePudding user response:

library(data.table)
#convert to data.table format
setDT(df1); setDT(df2)
#update join non-duplicated v1-values
df1[!duplicated(v1), v4 := df2[df1[!duplicated(v1),], v4, on = .(v3 = v1)]]

#    v1 v2 v4
# 1:  a  1 10
# 2:  a  2 NA
# 3:  b  3 20
# 4:  b  4 NA
# 5:  c  3 30
# 6:  c  4 NA    
  • Related