stack two columns into one (data.table) in r-CodePudding

i have the following data:

ID	A	B
1	4	2
2	5	3

The result I want is:

ID	C
1	4
1	2
2	5
2	3

CodePudding user response：

The cdata package has transformation functions that pivot and unpivot with multiple variables. What I like about the package is the control table idea that visually arranges the data in the form you want to see.

In this case, starting with a data frame df,

ID <- c(1, 2)
A <- c(4, 5)
B <- c(2, 3)
df <- data.table(ID, A, B)


# check the data 
df[]

#>   ID A B
#> 1  1 4 2
#> 2  2 5 3

Build a cdata control table. This is the basic element that allows you to unambiguously assign which data values go where. In this case, I use a new variable from to identify the columns from which values are taken and C is the new column with the desired values.

# build a control table
from <- c("col_A", "col_B")
C    <- c("A", "B")
control_table <- data.table(from, C)

# examine the result
control_table[]
#>     from C
#> 1: col_A A
#> 2: col_B B

With the control table ready, I can use rowrecs_to_blocks() from cdata to transform the data from row record (wide) form to block record (longer) form. The ID variable is repeated as many times as necessary to complete the block records.

# transform to block form
DT <- cdata::rowrecs_to_blocks(
  wideTable     = df, 
  controlTable  = control_table,
  columnsToCopy = c("ID"))
setDT(DT)

# examine the result
DT[]
#>    ID  from C
#> 1:  1 col_A 4
#> 2:  1 col_B 2
#> 3:  2 col_A 5
#> 4:  2 col_B 3

You can omit the from column, though by keeping it you can always recover the original data frame if you need to.

# omit the from column
DT[, from := NULL]
DT
#>    ID C
#> 1:  1 4
#> 2:  1 2
#> 3:  2 5
#> 4:  2 3

CodePudding user response：

Using data.table where melt is the dplyr's equivalent of pivot_longer

dt <- data.table(ID = 1:2, A = 4:5, B = 2:3)

dt <- melt(dt, measure.vars = c("A", "B"), value.name = "C")
dt[, variable := NULL]
setorder(dt, ID) # to get the same order as your desired output

dt
#    ID C
# 1:  1 4
# 2:  1 2
# 3:  2 5
# 4:  2 3

CodePudding user response：

Just use by.

DT[, .(C = c(A, B)), by = ID]
#       ID     C
#    <int> <int>
# 1:     1     4
# 2:     1     2
# 3:     2     5
# 4:     2     3

Reproducible data

DT = data.table(ID = 1:2, A = 4:5, B = 2:3)