I was wondering if there was a more terse way of manipulating variables (in my context dividing) variables that have near identical names.
Please consider the following dataset:
library(data.table)
n = 100
DT <- data.table(id = seq(1, n),
A1 = rnorm(n), A2 = rnorm(n), A3 = rnorm(n),
B1 = rnorm(n), B2 = rnorm(n), B3 = rnorm(n),
C1 = rnorm(n), C2 = rnorm(n), C3 = rnorm(n))
Here, there are 9 variables with the following pattern: There are string identifiers (the "A", "B", "C") and there are number identifiers (1,2,3). I would like to create variables that divide the various variables within the string identifiers like the following (if I were to do it manually):
DT[, `:=`(A_1_2 = A1/A2, A_1_3 = A1/A3, A_2_3 = A2/A3,
B_1_2 = B1/B2, B_1_3 = B1/B3, B_2_3 = B2/B3,
C_1_2 = C1/C2, C_1_3 = C1/C3, C_2_3 = C2/C3)]
Of course, with a smaller dataset, a manual approach is simple and easy. The problem is that the data is large in both the length of unique string and integer identifiers. A fast and terse solution would be greatly appreciated.
Thanks!
CodePudding user response:
We may use
for(nm in LETTERS[1:3]) {
DT[, paste(nm, 1, c(2,3), sep="_") := .(.SD[[1]]/.SD[[2]],
.SD[[1]]/.SD[[3]]), .SDcols = patterns(nm)]
}
-output
> head(DT)
id A1 A2 A3 B1 B2 B3 C1 C2 C3 A_1_2 A_1_3 B_1_2 B_1_3
<num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: 1 0.3542970 0.6165800 -0.1864986 -0.5327489 0.4524059 -1.0155552 0.4424219 1.0036830 -0.7000798 0.5746165 -1.8997306 -1.1775905 0.5245888
2: 1 0.3636432 0.5636047 0.2285641 -1.1874976 0.5418400 0.2764359 1.6741682 0.3488773 3.1991003 0.6452096 1.5909900 -2.1916020 -4.2957426
3: 1 -1.0641223 1.0731815 1.3408351 1.3454667 0.0338082 -1.2930563 -1.7898783 1.5873107 0.4569702 -0.9915585 -0.7936265 39.7970564 -1.0405322
4: 1 0.6483625 -0.3086274 -0.6715842 -1.0044038 -0.2900828 1.9253284 1.2584133 -0.3501206 0.4703024 -2.1007937 -0.9654225 3.4624735 -0.5216792
5: 1 0.6551013 1.2308940 -0.7867075 0.5416599 -0.8100544 0.5734997 0.6046887 0.5044593 -0.9537713 0.5322159 -0.8327127 -0.6686711 0.9444817
6: 1 -0.2912931 -1.7403816 0.8714699 -0.2484992 -1.7435360 -0.5650502 0.3410640 0.8594608 -0.4863061 0.1673731 -0.3342549 0.1425260 0.4397825
C_1_2 C_1_3
<num> <num>
1: 0.4407985 -0.6319592
2: 4.7987299 0.5233247
3: -1.1276168 -3.9168383
4: -3.5942282 2.6757535
5: 1.1986868 -0.6339976
6: 0.3968349 -0.7013361
Or may also use split.default
DT1 <- cbind(DT, DT[, lapply(split.default(.SD, sub("\\d ", "",
names(.SD))), \(x) replicate(2, x[[1]],
simplify = FALSE)/x[,-1, with = FALSE]), .SDcols = -1])
Update
If we need all the combinations
tmp <- DT[, lapply(split.default(.SD, sub("\\d ", "", names(.SD))),
function(x) {x1 <- combn(x, 2,
FUN = function(y) y[[1]]/y[[2]])
nm1 <- combn(names(x), 2, FUN = paste, collapse = "_")
setNames(as.data.frame(x1), nm1)
}), .SDcols = -1]
names(tmp) <- sub("^[^.] \\.", "", names(tmp))
DT1 <- cbind(DT, tmp)
-output
> head(DT1, 3)
id A1 A2 A3 B1 B2 B3 C1 C2 C3 A1_A2 A1_A3 A2_A3 B1_B2
<num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: 1 0.3542970 0.6165800 -0.1864986 -0.5327489 0.4524059 -1.0155552 0.4424219 1.0036830 -0.7000798 0.5746165 -1.8997306 -3.3060843 -1.177591
2: 1 0.3636432 0.5636047 0.2285641 -1.1874976 0.5418400 0.2764359 1.6741682 0.3488773 3.1991003 0.6452096 1.5909900 2.4658500 -2.191602
3: 1 -1.0641223 1.0731815 1.3408351 1.3454667 0.0338082 -1.2930563 -1.7898783 1.5873107 0.4569702 -0.9915585 -0.7936265 0.8003829 39.797056
B1_B3 B2_B3 C1_C2 C1_C3 C2_C3
<num> <num> <num> <num> <num>
1: 0.5245888 -0.44547643 0.4407985 -0.6319592 -1.4336693
2: -4.2957426 1.96009249 4.7987299 0.5233247 0.1090548
3: -1.0405322 -0.02614596 -1.1276168 -3.9168383 3.4735543