i have the following data:
ID | A | B |
---|---|---|
1 | 4 | 2 |
2 | 5 | 3 |
The result I want is:
ID | C |
---|---|
1 | 4 |
1 | 2 |
2 | 5 |
2 | 3 |
CodePudding user response:
The cdata package has transformation functions that pivot and unpivot with multiple variables. What I like about the package is the control table idea that visually arranges the data in the form you want to see.
In this case, starting with a data frame df
,
ID <- c(1, 2)
A <- c(4, 5)
B <- c(2, 3)
df <- data.table(ID, A, B)
# check the data
df[]
#> ID A B
#> 1 1 4 2
#> 2 2 5 3
Build a cdata control table. This is the basic element that allows you to unambiguously assign which data values go where. In this case, I use a new variable from
to identify the columns from which values are taken and C
is the new column with the desired values.
# build a control table
from <- c("col_A", "col_B")
C <- c("A", "B")
control_table <- data.table(from, C)
# examine the result
control_table[]
#> from C
#> 1: col_A A
#> 2: col_B B
With the control table ready, I can use rowrecs_to_blocks()
from cdata to transform the data from row record (wide) form to block record (longer) form. The ID
variable is repeated as many times as necessary to complete the block records.
# transform to block form
DT <- cdata::rowrecs_to_blocks(
wideTable = df,
controlTable = control_table,
columnsToCopy = c("ID"))
setDT(DT)
# examine the result
DT[]
#> ID from C
#> 1: 1 col_A 4
#> 2: 1 col_B 2
#> 3: 2 col_A 5
#> 4: 2 col_B 3
You can omit the from
column, though by keeping it you can always recover the original data frame if you need to.
# omit the from column
DT[, from := NULL]
DT
#> ID C
#> 1: 1 4
#> 2: 1 2
#> 3: 2 5
#> 4: 2 3
CodePudding user response:
Using data.table
where melt
is the dplyr
's equivalent of pivot_longer
dt <- data.table(ID = 1:2, A = 4:5, B = 2:3)
dt <- melt(dt, measure.vars = c("A", "B"), value.name = "C")
dt[, variable := NULL]
setorder(dt, ID) # to get the same order as your desired output
dt
# ID C
# 1: 1 4
# 2: 1 2
# 3: 2 5
# 4: 2 3
CodePudding user response:
Just use by
.
DT[, .(C = c(A, B)), by = ID]
# ID C
# <int> <int>
# 1: 1 4
# 2: 1 2
# 3: 2 5
# 4: 2 3
Reproducible data
DT = data.table(ID = 1:2, A = 4:5, B = 2:3)