Home > Blockchain >  How to summarize by ID from two different data frames using data.table
How to summarize by ID from two different data frames using data.table

Time:06-10

I have multiple data sets that I have been combining into one summary data frame in R. I am at the last step, but I am stymied.

Here is a dummy scenario: Imagine the two data frames df1 and df2 being

df1           df2
ID Make       ID Max
1  A          1  2
1  B          2  3
2  A          3  5
2  B
2  B
2  C
3  B

I want to show all unique instances of "Make" for each ID and append that to df2 The resulting df2 should look like:

df2
ID Max Make
1  2   A,B
2  3   A,B,C
3  5   B

My attempts have not worked. This is the closest I have gotten:

df2[df1
     , ':=' (Make = paste(Make, collapse = ","))
     , by = .EACHI
     , on = .(ID = ID)]

   ID Max Make
1:  1   2    B
2:  2   3    C
3:  3   5    B

Any help is appreciated.

CodePudding user response:

It may be easier with a two step process - i.e summarise the 'df1' data Make column by pasteing the unique elements by 'ID' and do a join with df2 on the 'ID' column

library(data.table)
df2[df1[, .(Make = toString(unique(Make))), ID], on = .(ID)]
      ID   Max    Make
   <num> <num>  <char>
1:     1     2    A, B
2:     2     3 A, B, C
3:     3     5       B

Or if we want to use .EACHI, instead of doing the assignment (:=), summarise the output within a list (or .()

df1[df2, .(Max, Make = toString(unique(Make))), on = .(ID), by = .EACHI]
      ID   Max    Make
   <num> <num>  <char>
1:     1     2    A, B
2:     2     3 A, B, C
3:     3     5       B

NOTE: Here we assume both datasets to be data.table objects

data

df1 <- structure(list(ID = c(1, 1, 2, 2, 2, 2, 3), Make = c("A", "B", 
"A", "B", "B", "C", "B")), row.names = c(NA, -7L), class = c("data.table", 
"data.frame"))

df2 <- structure(list(ID = c(1, 2, 3), Max = c(2, 3, 5)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"))
  • Related