Home > Enterprise >  R data.table join with concatenation of rows for a column
R data.table join with concatenation of rows for a column

Time:12-21

I want to join two data.tables, taking a character column from one and loading it as a concatenation of character columns of matched rows.

To illustrate--starting with two tables:

> dt1<-data.table(rbind(c(Id=1, Name="Apple"),c(Id=1, Name="Orange"), c(Id=2, Name="Banana")))
> dt1
   Id   Name
1:  1  Apple
2:  1 Orange
3:  2 Banana

> dt2<-data.table(rbind(c(Id=1, Flavor="Sweet"),c(Id=2, Flavor="Bland")))
> dt2
   Id Flavor
1:  1  Sweet
2:  2  Bland

A third data.table should have "Orange" and "Apple" concatenated together for the row with Id 1, looking like this:

> dt3<-data.table(rbind(c(Id=1, Flavor="Sweet", Names="Apple; Orange"),c(Id=2, Flavor="Bland", Names="Banana")))
> dt3
   Id Flavor         Names
1:  1  Sweet Apple; Orange
2:  2  Bland        Banana

I tried a few things but none of these concatenate.

> dt1[dt2, .(Id, Flavor, Names=paste(Name, sep="; ")), on=.(Id)]
   Id Flavor  Names
1:  1  Sweet  Apple
2:  1  Sweet Orange
3:  2  Bland Banana

> dt1[dt2, .(Id, Flavor, Name), on=.(Id)][, .(Names=paste(Name, sep="; ")), keyby=.(Id, Flavor)]
   Id Flavor  Names
1:  1  Sweet  Apple
2:  1  Sweet Orange
3:  2  Bland Banana

CodePudding user response:

An approach using join and grouping with unique.

dt1[dt2, on = "Id"][, .(Flavor = .(unique(Flavor)), Name = .(Name)), by = "Id"]
   Id Flavor         Name
1:  1  Sweet Apple,Orange
2:  2  Bland       Banana

CodePudding user response:

Per the comment from @thelatemail, the way to do it is with collapse= in the call to paste()

> dt3<-dt1[dt2, .(Id, Flavor, Name), on=.(Id)][, .(Names=paste(Name, collapse="; ")), keyby=.(Id, Flavor)]
> dt3
   Id Flavor         Names
1:  1  Sweet Apple; Orange
2:  2  Bland        Banana
  • Related