Home > OS >  R data.table explode vector column to rows
R data.table explode vector column to rows

Time:03-18

I have a data.table and a vector which I would like to add as a column of the dt - specifically, "explode" the values of vector to rows of the dt.

Edit: tidyr::expand_grid and tidyr::crossing give the desired result

The code below gives me what I want but it feels like there should be a much simpler way to do it.

Desired result

(transform example_data to modified_data)

library(data.table)

example_data <- data.table(id = c("a", "b", "c"), value=c(1,2,3))
print(example_data)
#>        id value
#>    <char> <num>
#> 1:      a     1
#> 2:      b     2
#> 3:      c     3

explode_vector <- c("foo", "bar")

dt_list <- list()
for (val in explode_vector) {
    temp_df <- copy(example_data)
    temp_df[, new_val := eval(val)]
    dt_list <- append(dt_list, list(temp_df))
}

modified_data <- rbindlist(dt_list)
print(modified_data)
#>        id value new_val
#>    <char> <num>  <char>
#> 1:      a     1     foo
#> 2:      b     2     foo
#> 3:      c     3     foo
#> 4:      a     1     bar
#> 5:      b     2     bar
#> 6:      c     3     bar

e.g. it feels like one of the two options below should work but they don't quite. Does anyone have a suggestion for doing this?

# Some examples that I feel should work
example_data[, new_val := eval(explode_vector), by=id]
cbind(example_data, data.table(new_val=explode_vector))

Edit: Extending with duplicate id values

For example, there may be multiple identification keys (e.g. time-series observations) so the primary id is duplicated.

library(data.table)

example_data <- data.table(id = c("a", "a", "b", "c"), date=c(0, 1, 0, 1), value=c(1,2,3,1))
print(example_data)
       id  date value
   <char> <num> <num>
1:      a     0     1
2:      a     1     2
3:      b     0     3
4:      c     1     1

explode_vector <- c("foo", "bar")

# A solution using the CJ method
example_data[, CJ(id, explode_vector, unique=TRUE)][example_data, on = .(id)]
       id explode_vector  date value
   <char>         <char> <num> <num>
1:      a            bar     0     1
2:      a            foo     0     1
3:      a            bar     1     2
4:      a            foo     1     2
5:      b            bar     0     3
6:      b            foo     0     3
7:      c            bar     1     1
8:      c            foo     1     1

# A solution using the expand_grid method (simpler in my opinion but requires 
# converting back to `data.table` if trying to stick to `data.table`)
# I also prefer that it retains the original column order.
data.table(expand_grid(example_data, explode_vector))
       id  date value explode_vector
   <char> <num> <num>         <char>
1:      a     0     1            foo
2:      a     0     1            bar
3:      a     1     2            foo
4:      a     1     2            bar
5:      b     0     3            foo
6:      b     0     3            bar
7:      c     1     1            foo
8:      c     1     1            bar

CodePudding user response:

You can use expand_grid from tidyr package.

library(tidyr)
library(data.table)

example_data <- data.table(id = c("a", "b", "c"), value=c(1,2,3))
explode_vector <- c("foo", "bar")
tidyr::expand_grid(example_data, explode_vector)

# A tibble: 6 x 3
#   id    value explode_vector
#   <chr> <dbl> <chr>
# 1 a         1 foo
# 2 a         1 bar
# 3 b         2 foo
# 4 b         2 bar
# 5 c         3 foo
# 6 c         3 bar

CodePudding user response:

In data.table an option is CJ

library(data.table)
example_data[, CJ(id, explode_vector)][example_data, on = .(id)]

-output

       id explode_vector value
   <char>         <char> <num>
1:      a            bar     1
2:      a            foo     1
3:      b            bar     2
4:      b            foo     2
5:      c            bar     3
6:      c            foo     3

If it is multiple columns, we may use crossing

library(tidyr)
crossing(example_data[, ..id_cols], explode_vector)
  • Related