I'm trying to combine to payment streams by date, however when I use full_join it duplicates some payment streams, making analysis impossible.
For Example when I run this code with example data:
library(tidyquant)
v1 = c("a", "a", "b", "b", "c", "c")
v2 = c(1,2,3,4,3,4)
v3 = c("a", "b", "c")
v4 = c(10,20,30)
df1 = data.frame(v1, v2)
df2 = data.frame(v3, v4)
full_join(df1, df2, by = c("v1" = "v3"))
I get this output:
v1 v2 v4
1 a 1 10
2 a 2 10
3 b 3 20
4 b 4 20
5 c 3 30
6 c 4 30
But I want this:
v1 v2 v4
1 a 1 10
2 a 2 NA
3 b 3 20
4 b 4 NA
5 c 3 30
6 c 4 NA
Is there any way, using dplyrs join or other functions to get the output I want.
CodePudding user response:
library(data.table)
#convert to data.table format
setDT(df1); setDT(df2)
#update join non-duplicated v1-values
df1[!duplicated(v1), v4 := df2[df1[!duplicated(v1),], v4, on = .(v3 = v1)]]
# v1 v2 v4
# 1: a 1 10
# 2: a 2 NA
# 3: b 3 20
# 4: b 4 NA
# 5: c 3 30
# 6: c 4 NA