Home > OS >  How to append character vectors in a list variable of a data.table when aggregating by an id variabl
How to append character vectors in a list variable of a data.table when aggregating by an id variabl

Time:01-27

I have a large data.table object where one variable is a list of character vectors. I would like to aggregated by a unique ID and in the process combine all character vectors associated with each row of that unique ID. Here is a simple repeatable example:

DT <- data.table(ID = c(LETTERS[1:10], LETTERS[1:10]),
                 var = replicate(n = 20,
                                 expr = sample(x = letters, size = 5, replace = F),
                                 simplify = FALSE))
str(DT)

I've tried both the aggregate function and to adapt more specific data.table notation. The aggregate function can't handle lists and I can't figure out the data.table notation for list variables:

appended <- aggregate(var~ID, data = DT, FUN = "append")
appended <- DT[, .(var=append(var), ID=ID[1]), by="ID"]

Ideally, my output would be of the structure:

> str(appended)
Classes ‘data.table’ and 'data.frame':  10 obs. of  2 variables:
 $ ID : chr  "A" "B" "C" "D" ...
 $ var:List of 10
    ..

I don't mind if elements within each of the appended vectors (appended$var) are repeated but I plan on removing duplicates from within each vector later, so if that is a side effect of the appending/aggregating process then I'm ok with that.

Any solutions or even just links to specific documentation on this case that I haven't yet found?

CodePudding user response:

You could use append() with Reduce():

DT[, .(list(var = Reduce(append, var))), by = ID]

But I would suggest unlist() instead:

DT[, .(list(var = unlist(var))), by = ID]
  • Related