Home > Enterprise >  Adding columns to inner join (data.table)
Adding columns to inner join (data.table)

Time:04-21

I am adding columns to an inner join and do not understanding the result.

Consider tables A,B:

A <- data.table(id=c(1,2,3), x_val = c("x1", "x2", "x3"))

    id x_val  
# 1:  1    x1  
# 2:  2    x2 
# 3:  3    x3

B <- data.table(id=c(1,2,4), y_val = c("y1", "y2", "y3"))

#    id y_val
# 1:  1    y1
# 2:  2    y2
# 3:  4    y3

Now consider these joins, first two make complete sense.

A[B, on=.(id)]

# rows=3 This join is what I expect.  The last row of B is included, but col A no has match. 

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2
# 3:     4   <NA>     y3
#


A[B, on=.(id), nomatch=NULL]

#   rows=2 To remove the unmatching row use nomatch=NULL (ie inner join)

#       id  x_val  y_val
#    <num> <char> <char>
# 1:     1     x1     y1
# 2:     2     x2     y2

Now the surprise.
Done with rows, now focus on columns:
In the cases below each of the columns and labels are expected, but the number of rows is not. I expect 2 rows in each case.

What am I missing?

A[B, .(A.id = A$id), on=.(id), nomatch=NULL]

#     A.id
#    <num>
# 1:     1
# 2:     2
# 3:     3
A[B, .(B$id), on=.(id), nomatch=NULL]#
#       V1
#    <num>
# 1:     1
# 2:     2
# 3:     4
A[B, .(A.id = A$id, B.id= B$id,  A.x_val = A$x_val, B$y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val     V4
#    <num> <num>  <char> <char>
# 1:     1     1      x1     y1
# 2:     2     2      x2     y2
# 3:     3     4      x3     y3

CodePudding user response:

To select columns while merging two data.tables (like you are doing), you should not use a dollar symbol. You can prefix the names of the columns in A and B (in the merge of the A[B]) by x. and i. respectively (see below).
What you're missing is that, in your examples, you are selecting the columns in the original dataset (which has 3 rows) and not in the (inner)joined dataset which has 2 rows.

A[B, .(A.id = x.id), on=.(id), nomatch=NULL]  # prefixing id with x. select id from A
#     A.id
# 1:     1
# 2:     2
A[B, .(i.id), on=.(id), nomatch=NULL]         # prefixing id with i. select id from B
#     i.id
# 1:     1
# 2:     2
A[B, .(A.id = x.id, B.id= i.id,  A.x_val = x.x_val, i.y_val), on=.(id), nomatch=NULL]
#     A.id  B.id A.x_val i.y_val
# 1:     1     1      x1      y1
# 2:     2     2      x2      y2
  • Related