Home > OS >  Effective appending data to existing data.table by joining and coalescing
Effective appending data to existing data.table by joining and coalescing

Time:12-09

I have two data.tables: dt_main is the main one, to which I need to append information of only one column from dt_add:

dt_main <- data.table(id    = c(1:5)
                  , name = c("a", "b", NA,NA,NA)
                  , stuff = c(11:15))

dt_add <- data.table(id = c(4:5)
                  , name = c("aaa", "bbb"))

I got the job correctly done by first joining then coalescing:

dt_main_final <- dt_add[dt_main, on = "id"]

dt_main_final[, name := fcoalesce(name, i.name)][, i.name:=NULL]

The provided output is as expected:

   id name stuff
1:  1    a    11
2:  2    b    12
3:  3 <NA>    13
4:  4  aaa    14
5:  5  bbb    15

I wander whether there is a more direct way to have it done, any suggestions? Thanks.

PS> I also tried melting then dcasting:

dt <- dt_add[dt_main, on = "id"]
setnames(dt, "i.name", "name")

dt_melt <- melt(dt
                , measure.vars = patterns("name")
                )

dt_main_final <- dcast(dt_melt
                      , id   stuff ~ variable
                      , fun.aggregate =  fcoalesce                      
                      , value.var = "value")

I got the error:

Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.

Any ideas for this one also?

CodePudding user response:

We can do the join and coalesce in one step:

dt_main[dt_add, name := fcoalesce(name, i.name), on = .(id)]
dt_main
#       id   name stuff
#    <int> <char> <int>
# 1:     1      a    11
# 2:     2      b    12
# 3:     3   <NA>    13
# 4:     4    aaa    14
# 5:     5    bbb    15
  • Related