As I know, the standard way to use dcast
function on object with class data.table
is to specify variables (which will become column names) and values (which will become values of these columns).
I have a data.table
with three columns - id, variable and value:
library(data.table)
dt <- data.table(id = c(1, 2, 1),
variable = c("var_1", "var_1", "var_2"),
value = c(100, 200, 300))
dt
#> id variable value
#> 1: 1 var_1 100
#> 2: 2 var_1 200
#> 3: 1 var_2 300
And I want this output provided by dcast
:
dt_wide <- dcast(dt, id ~ variable, value.var = "value")
dt_wide
#> id var_1 var_2
#> 1: 1 100 300
#> 2: 2 200 NA
But my question is - can I do this without specyfing variable? I.e. can I use dcast
and get the output as above, having object as below?
dt[, variable := NULL]
dt
#> id value
#> 1: 1 100
#> 2: 2 200
#> 3: 1 300
# dcast(dt)? Result:
data.table(id = c(1, 2),
V1 = c(100, 200),
V2 = c(300, NA))
#> id V1 V2
#> 1: 1 100 300
#> 2: 2 200 NA
I can imagine this is theoretically possible, algorithm could look like this:
- Starting from top, take the first value from each id and put it into newly created column (choose name automatically).
- Take the second value for each id - if nothing for some id, put
NA
. - And so on, until all values are taken.
I'm asking because I have the data only with id
and value
columns and want to perform this without additional computing (i.e. adding new column).
In my case dcast
is really fast and I have found that addind new column is more computationaly expensive than performing dcast
- so I would like to avoid this. Although maybe dcast is so fast because of use this variable
column :)
CodePudding user response:
With dcast
, we can create formula on the fly with an expression created with paste
and rowid
library(data.table)
dcast(dt, id ~ paste0('var_', rowid(id)))
-output
id var_1 var_2
1: 1 100 300
2: 2 200 NA