I have a data.table and a vector which I would like to add as a column of the dt - specifically, "explode" the values of vector to rows of the dt.
Edit: tidyr::expand_grid
and tidyr::crossing
give the desired result
The code below gives me what I want but it feels like there should be a much simpler way to do it.
Desired result
(transform example_data
to modified_data
)
library(data.table)
example_data <- data.table(id = c("a", "b", "c"), value=c(1,2,3))
print(example_data)
#> id value
#> <char> <num>
#> 1: a 1
#> 2: b 2
#> 3: c 3
explode_vector <- c("foo", "bar")
dt_list <- list()
for (val in explode_vector) {
temp_df <- copy(example_data)
temp_df[, new_val := eval(val)]
dt_list <- append(dt_list, list(temp_df))
}
modified_data <- rbindlist(dt_list)
print(modified_data)
#> id value new_val
#> <char> <num> <char>
#> 1: a 1 foo
#> 2: b 2 foo
#> 3: c 3 foo
#> 4: a 1 bar
#> 5: b 2 bar
#> 6: c 3 bar
e.g. it feels like one of the two options below should work but they don't quite. Does anyone have a suggestion for doing this?
# Some examples that I feel should work
example_data[, new_val := eval(explode_vector), by=id]
cbind(example_data, data.table(new_val=explode_vector))
Edit: Extending with duplicate id
values
For example, there may be multiple identification keys (e.g. time-series observations) so the primary id
is duplicated.
library(data.table)
example_data <- data.table(id = c("a", "a", "b", "c"), date=c(0, 1, 0, 1), value=c(1,2,3,1))
print(example_data)
id date value
<char> <num> <num>
1: a 0 1
2: a 1 2
3: b 0 3
4: c 1 1
explode_vector <- c("foo", "bar")
# A solution using the CJ method
example_data[, CJ(id, explode_vector, unique=TRUE)][example_data, on = .(id)]
id explode_vector date value
<char> <char> <num> <num>
1: a bar 0 1
2: a foo 0 1
3: a bar 1 2
4: a foo 1 2
5: b bar 0 3
6: b foo 0 3
7: c bar 1 1
8: c foo 1 1
# A solution using the expand_grid method (simpler in my opinion but requires
# converting back to `data.table` if trying to stick to `data.table`)
# I also prefer that it retains the original column order.
data.table(expand_grid(example_data, explode_vector))
id date value explode_vector
<char> <num> <num> <char>
1: a 0 1 foo
2: a 0 1 bar
3: a 1 2 foo
4: a 1 2 bar
5: b 0 3 foo
6: b 0 3 bar
7: c 1 1 foo
8: c 1 1 bar
CodePudding user response:
You can use expand_grid
from tidyr package.
library(tidyr)
library(data.table)
example_data <- data.table(id = c("a", "b", "c"), value=c(1,2,3))
explode_vector <- c("foo", "bar")
tidyr::expand_grid(example_data, explode_vector)
# A tibble: 6 x 3
# id value explode_vector
# <chr> <dbl> <chr>
# 1 a 1 foo
# 2 a 1 bar
# 3 b 2 foo
# 4 b 2 bar
# 5 c 3 foo
# 6 c 3 bar
CodePudding user response:
In data.table
an option is CJ
library(data.table)
example_data[, CJ(id, explode_vector)][example_data, on = .(id)]
-output
id explode_vector value
<char> <char> <num>
1: a bar 1
2: a foo 1
3: b bar 2
4: b foo 2
5: c bar 3
6: c foo 3
If it is multiple columns, we may use crossing
library(tidyr)
crossing(example_data[, ..id_cols], explode_vector)