Home > Software engineering >  Manipulation of Variables with Similar Names in data.table
Manipulation of Variables with Similar Names in data.table

Time:06-19

I was wondering if there was a more terse way of manipulating variables (in my context dividing) variables that have near identical names.

Please consider the following dataset:

library(data.table)
n = 100
DT <- data.table(id = seq(1, n),
                 A1 = rnorm(n), A2 = rnorm(n), A3 = rnorm(n),
                 B1 = rnorm(n), B2 = rnorm(n), B3 = rnorm(n),
                 C1 = rnorm(n), C2 = rnorm(n), C3 = rnorm(n))

Here, there are 9 variables with the following pattern: There are string identifiers (the "A", "B", "C") and there are number identifiers (1,2,3). I would like to create variables that divide the various variables within the string identifiers like the following (if I were to do it manually):

DT[, `:=`(A_1_2 = A1/A2, A_1_3 = A1/A3, A_2_3 = A2/A3,
          B_1_2 = B1/B2, B_1_3 = B1/B3, B_2_3 = B2/B3,
          C_1_2 = C1/C2, C_1_3 = C1/C3, C_2_3 = C2/C3)]

Of course, with a smaller dataset, a manual approach is simple and easy. The problem is that the data is large in both the length of unique string and integer identifiers. A fast and terse solution would be greatly appreciated.

Thanks!

CodePudding user response:

We may use

for(nm in LETTERS[1:3]) {

   DT[, paste(nm, 1, c(2,3), sep="_") := .(.SD[[1]]/.SD[[2]], 
              .SD[[1]]/.SD[[3]]), .SDcols = patterns(nm)]
}

-output

> head(DT)
      id         A1         A2         A3         B1         B2         B3         C1         C2         C3      A_1_2      A_1_3      B_1_2      B_1_3
   <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>
1:     1  0.3542970  0.6165800 -0.1864986 -0.5327489  0.4524059 -1.0155552  0.4424219  1.0036830 -0.7000798  0.5746165 -1.8997306 -1.1775905  0.5245888
2:     1  0.3636432  0.5636047  0.2285641 -1.1874976  0.5418400  0.2764359  1.6741682  0.3488773  3.1991003  0.6452096  1.5909900 -2.1916020 -4.2957426
3:     1 -1.0641223  1.0731815  1.3408351  1.3454667  0.0338082 -1.2930563 -1.7898783  1.5873107  0.4569702 -0.9915585 -0.7936265 39.7970564 -1.0405322
4:     1  0.6483625 -0.3086274 -0.6715842 -1.0044038 -0.2900828  1.9253284  1.2584133 -0.3501206  0.4703024 -2.1007937 -0.9654225  3.4624735 -0.5216792
5:     1  0.6551013  1.2308940 -0.7867075  0.5416599 -0.8100544  0.5734997  0.6046887  0.5044593 -0.9537713  0.5322159 -0.8327127 -0.6686711  0.9444817
6:     1 -0.2912931 -1.7403816  0.8714699 -0.2484992 -1.7435360 -0.5650502  0.3410640  0.8594608 -0.4863061  0.1673731 -0.3342549  0.1425260  0.4397825
        C_1_2      C_1_3
        <num>      <num>
1:  0.4407985 -0.6319592
2:  4.7987299  0.5233247
3: -1.1276168 -3.9168383
4: -3.5942282  2.6757535
5:  1.1986868 -0.6339976
6:  0.3968349 -0.7013361

Or may also use split.default

DT1 <-  cbind(DT, DT[, lapply(split.default(.SD, sub("\\d ", "", 
    names(.SD))), \(x) replicate(2, x[[1]], 
    simplify = FALSE)/x[,-1, with = FALSE]), .SDcols = -1])

Update

If we need all the combinations

tmp <- DT[, lapply(split.default(.SD, sub("\\d ", "", names(.SD))), 
     function(x) {x1 <- combn(x, 2, 
     FUN = function(y) y[[1]]/y[[2]])
      nm1 <- combn(names(x), 2, FUN = paste, collapse = "_")
      setNames(as.data.frame(x1), nm1)
      }),  .SDcols = -1]
names(tmp) <- sub("^[^.] \\.", "", names(tmp))
DT1 <- cbind(DT, tmp)

-output

> head(DT1, 3)
      id         A1        A2         A3         B1        B2         B3         C1        C2         C3      A1_A2      A1_A3      A2_A3     B1_B2
   <num>      <num>     <num>      <num>      <num>     <num>      <num>      <num>     <num>      <num>      <num>      <num>      <num>     <num>
1:     1  0.3542970 0.6165800 -0.1864986 -0.5327489 0.4524059 -1.0155552  0.4424219 1.0036830 -0.7000798  0.5746165 -1.8997306 -3.3060843 -1.177591
2:     1  0.3636432 0.5636047  0.2285641 -1.1874976 0.5418400  0.2764359  1.6741682 0.3488773  3.1991003  0.6452096  1.5909900  2.4658500 -2.191602
3:     1 -1.0641223 1.0731815  1.3408351  1.3454667 0.0338082 -1.2930563 -1.7898783 1.5873107  0.4569702 -0.9915585 -0.7936265  0.8003829 39.797056
        B1_B3       B2_B3      C1_C2      C1_C3      C2_C3
        <num>       <num>      <num>      <num>      <num>
1:  0.5245888 -0.44547643  0.4407985 -0.6319592 -1.4336693
2: -4.2957426  1.96009249  4.7987299  0.5233247  0.1090548
3: -1.0405322 -0.02614596 -1.1276168 -3.9168383  3.4735543
  • Related