I have a data table test
:
id | key |
---|---|
1 | 2365 |
1 | 2365 |
1 | 3709 |
2 | 6734 |
2 | 1908 |
2 | 4523 |
I want to aggregate unique key
values by id
into vector using data.table
package.
Expected output:
id | key_array |
---|---|
1 | "2365", "3709" |
2 | "6734", "1908", "4523" |
So, this should work like array_agg
sql function.
I tried:
res <- test[, list(key_array = paste(unique(key), collapse = ", ")), by = "id"]
, but I get just a string. But I need to have opportunity to find the length of each vector and operate with its certain elements (find the intersection of two vectors for example).
CodePudding user response:
1. Base R
This an aggregate
one-liner.
x <- 'id key
1 2365
1 2365
1 3709
2 6734
2 1908
2 4523'
test <- read.table(textConnection(x), header = TRUE)
aggregate(key ~ id, test, \(x) c(unique(x)))
#> id key
#> 1 1 2365, 3709
#> 2 2 6734, 1908, 4523
Created on 2022-06-14 by the reprex package (v2.0.1)
But if user @Chris's comment is right then the right solution as follows.
aggregate(key ~ id, test, \(x) paste(unique(x), collapse = ", "))
Note that both c(unique(x))
and as.character(c(unique(x)))
will output a list column, so the latter solution is right anyway.
2. Package data.table
Once again a one-liner.
The output is a list column, with each list member an integer vector. To keep as integers use
list(unique(key))
instead.
suppressPackageStartupMessages(library(data.table))
res <- setDT(test)[, .(key_array = list(as.character(unique(key)))), by = id]
res
#> id key_array
#> 1: 1 2365,3709
#> 2: 2 6734,1908,4523
str(res)
#> Classes 'data.table' and 'data.frame': 2 obs. of 2 variables:
#> $ id : int 1 2
#> $ key_array:List of 2
#> ..$ : chr "2365" "3709"
#> ..$ : chr "6734" "1908" "4523"
#> - attr(*, ".internal.selfref")=<externalptr>
Created on 2022-06-14 by the reprex package (v2.0.1)
Then, in order to access the vectors use two extractors, one to extract the column and the other one to extract the vectors.
res$key_array[[1]]
#> [1] "2365" "3709"
res$key_array[[2]]
#> [1] "6734" "1908" "4523"
Created on 2022-06-14 by the reprex package (v2.0.1)
3. dplyr
solution
Group by id
and collapse the unique strings into one only.
suppressPackageStartupMessages(library(dplyr))
test %>%
group_by(id) %>%
summarise(key_array = paste(unique(key), collapse = ", "))
#> # A tibble: 2 × 2
#> id key_array
#> <int> <chr>
#> 1 1 2365, 3709
#> 2 2 6734, 1908, 4523
Created on 2022-06-14 by the reprex package (v2.0.1)