I am adding columns to an inner join and do not understanding the result.
Consider tables A,B:
A <- data.table(id=c(1,2,3), x_val = c("x1", "x2", "x3"))
id x_val
# 1: 1 x1
# 2: 2 x2
# 3: 3 x3
B <- data.table(id=c(1,2,4), y_val = c("y1", "y2", "y3"))
# id y_val
# 1: 1 y1
# 2: 2 y2
# 3: 4 y3
Now consider these joins, first two make complete sense.
A[B, on=.(id)]
# rows=3 This join is what I expect. The last row of B is included, but col A no has match.
# id x_val y_val
# <num> <char> <char>
# 1: 1 x1 y1
# 2: 2 x2 y2
# 3: 4 <NA> y3
#
A[B, on=.(id), nomatch=NULL]
# rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)
# id x_val y_val
# <num> <char> <char>
# 1: 1 x1 y1
# 2: 2 x2 y2
Now the surprise.
Done with rows, now focus on columns
:
In the cases below each of the columns
and labels are expected, but the number of rows is not. I expect 2 rows in each case.
What am I missing?
A[B, .(A.id = A$id), on=.(id), nomatch=NULL]
# A.id
# <num>
# 1: 1
# 2: 2
# 3: 3
A[B, .(B$id), on=.(id), nomatch=NULL]#
# V1
# <num>
# 1: 1
# 2: 2
# 3: 4
A[B, .(A.id = A$id, B.id= B$id, A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
# A.id B.id A.x_val V4
# <num> <num> <char> <char>
# 1: 1 1 x1 y1
# 2: 2 2 x2 y2
# 3: 3 4 x3 y3
CodePudding user response:
To select columns while merging two data.table
s (like you are doing), you should not use a dollar symbol. You can prefix the names of the columns in A and B (in the merge of the A[B]
) by x.
and i.
respectively (see below).
What you're missing is that, in your examples, you are selecting the columns in the original dataset (which has 3 rows) and not in the (inner)joined dataset which has 2 rows.
A[B, .(A.id = x.id), on=.(id), nomatch=NULL] # prefixing id with x. select id from A
# A.id
# 1: 1
# 2: 2
A[B, .(i.id), on=.(id), nomatch=NULL] # prefixing id with i. select id from B
# i.id
# 1: 1
# 2: 2
A[B, .(A.id = x.id, B.id= i.id, A.x_val = x.x_val, i.y_val), on=.(id), nomatch=NULL]
# A.id B.id A.x_val i.y_val
# 1: 1 1 x1 y1
# 2: 2 2 x2 y2