I have a large data.table object where one variable is a list of character vectors. I would like to aggregated by a unique ID and in the process combine all character vectors associated with each row of that unique ID. Here is a simple repeatable example:
DT <- data.table(ID = c(LETTERS[1:10], LETTERS[1:10]),
var = replicate(n = 20,
expr = sample(x = letters, size = 5, replace = F),
simplify = FALSE))
str(DT)
I've tried both the aggregate
function and to adapt more specific data.table notation. The aggregate function can't handle lists and I can't figure out the data.table notation for list variables:
appended <- aggregate(var~ID, data = DT, FUN = "append")
appended <- DT[, .(var=append(var), ID=ID[1]), by="ID"]
Ideally, my output would be of the structure:
> str(appended)
Classes ‘data.table’ and 'data.frame': 10 obs. of 2 variables:
$ ID : chr "A" "B" "C" "D" ...
$ var:List of 10
..
I don't mind if elements within each of the appended vectors (appended$var) are repeated but I plan on removing duplicates from within each vector later, so if that is a side effect of the appending/aggregating process then I'm ok with that.
Any solutions or even just links to specific documentation on this case that I haven't yet found?
CodePudding user response:
You could use append() with Reduce():
DT[, .(list(var = Reduce(append, var))), by = ID]
But I would suggest unlist() instead:
DT[, .(list(var = unlist(var))), by = ID]