Home > database >  Any more concise `data.table` way to unnest a nested column in data.table?
Any more concise `data.table` way to unnest a nested column in data.table?

Time:09-14

Assuming we have a data.table with a nested column val

dt <- data.table(
  grp = c(1, 2, 1, 3, 4),
  val = list("a", c("b", "c"), c("d", "e", "f"), "g", c("h", "i"))
)

which shows as

> dt
   grp   val
1:   1     a
2:   2   b,c
3:   1 d,e,f
4:   3     g
5:   4   h,i

Question

I would like to unnest the val column, where a possible option is using tidyr::unnest, i.e.,

> dt %>%
    unnest(val)
# A tibble: 9 × 2
    grp val
  <dbl> <chr>
1     1 a
2     2 b
3     2 c
4     1 d
5     1 e
6     1 f
7     3 g
8     4 h
9     4 i

I am wondering if we can implement it by using the data.table only.

Watch out the order of values in the column grp, I would like to preserve the order like 1,2,1,3,4 rather than 1,1,2,3,4.


My data.table Attempt

My attempt is as below

> dt[, id := .I][, lapply(.SD, unlist), id][, id := NULL][]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

or

> dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

but I guess there might be some more concise and elegant way to do this, e.g., without creating an auxiliary column id or using rep lengths.

Any idea? Much appreciated!

CodePudding user response:

One more option:

dt[, .(grp, val = unlist(val)), by = .I][, !"I"]
#      grp    val
#    <num> <char>
# 1:     1      a
# 2:     2      b
# 3:     2      c
# 4:     1      d
# 5:     1      e
# 6:     1      f
# 7:     3      g
# 8:     4      h
# 9:     4      i

PS. If you are using data.table 1.4.2 or older by = .I won't work. Instead you can use:

dt[, .(grp, val = unlist(val)), by = 1:nrow(dt)][, !"nrow"]

PS2. I think your dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))] is neater.

CodePudding user response:

I guess this works:

dt[, lapply(.SD, unlist), by = .(grp)]

   grp val
1:   1   a
2:   1   d
3:   1   e
4:   1   f
5:   2   b
6:   2   c
7:   3   g
8:   4   h
9:   4   i
  • Related