Home > Enterprise >  Is it possible to use dcast without variable column? [duplicate]
Is it possible to use dcast without variable column? [duplicate]

Time:10-01

As I know, the standard way to use dcast function on object with class data.table is to specify variables (which will become column names) and values (which will become values of these columns).

I have a data.table with three columns - id, variable and value:

library(data.table)

dt <- data.table(id = c(1, 2, 1),
                 variable = c("var_1", "var_1", "var_2"),
                 value = c(100, 200, 300))

dt

#>    id variable value
#> 1:  1    var_1   100
#> 2:  2    var_1   200
#> 3:  1    var_2   300

And I want this output provided by dcast:

dt_wide <- dcast(dt, id ~ variable, value.var = "value")

dt_wide
#>    id var_1 var_2
#> 1:  1   100   300
#> 2:  2   200    NA

But my question is - can I do this without specyfing variable? I.e. can I use dcast and get the output as above, having object as below?

dt[, variable := NULL]

dt
#>    id value
#> 1:  1   100
#> 2:  2   200
#> 3:  1   300

# dcast(dt)? Result:

data.table(id = c(1, 2),
           V1 = c(100, 200),
           V2 = c(300, NA))
#>    id  V1  V2
#> 1:  1 100 300
#> 2:  2 200  NA

I can imagine this is theoretically possible, algorithm could look like this:

  1. Starting from top, take the first value from each id and put it into newly created column (choose name automatically).
  2. Take the second value for each id - if nothing for some id, put NA.
  3. And so on, until all values are taken.

I'm asking because I have the data only with id and value columns and want to perform this without additional computing (i.e. adding new column).

In my case dcast is really fast and I have found that addind new column is more computationaly expensive than performing dcast - so I would like to avoid this. Although maybe dcast is so fast because of use this variable column :)

CodePudding user response:

With dcast, we can create formula on the fly with an expression created with paste and rowid

library(data.table)
dcast(dt, id ~ paste0('var_', rowid(id)))

-output

   id var_1 var_2
1:  1   100   300
2:  2   200    NA
  • Related