Home > Blockchain >  R data.table Vectorising Column Multiplication
R data.table Vectorising Column Multiplication

Time:11-30

Suppose I have a data.table like this (imagine it has many columns like "a1, ..., a100, ..." and similarly "b1, ..., b100, ...")

dt <- data.table(id = 1:5, a1 = runif(5), a2 = runif(5), b1 = runif(5), b2 = runif(5))

so that the output looks like this:

   id         a1         a2         b1         b2
1:  1 0.94431156 0.34668771 0.54899478 0.91512664
2:  2 0.32730005 0.87924651 0.88777763 0.90167832
3:  3 0.07438915 0.53728539 0.21463741 0.11291512
4:  4 0.23025893 0.08528074 0.68454936 0.45441690
5:  5 0.86105462 0.49976703 0.07362091 0.08834252

I want to create new columns c1, c2 such that effectively I have

dt[, c('c1', 'c2') := .(a1*b1, a2*b2)]

with output

   id         a1         a2         b1         b2         c1         c2
1:  1 0.94431156 0.34668771 0.54899478 0.91512664 0.51842212 0.31726316
2:  2 0.32730005 0.87924651 0.88777763 0.90167832 0.29056966 0.79279752
3:  3 0.07438915 0.53728539 0.21463741 0.11291512 0.01596669 0.06066765
4:  4 0.23025893 0.08528074 0.68454936 0.45441690 0.15762361 0.03875301
5:  5 0.86105462 0.49976703 0.07362091 0.08834252 0.06339162 0.04415068

How can this be achieved without the use of slow loops?

CodePudding user response:

Since matrix multiplication is element-wise in R, you can do the following:

dt <- data.frame(id = 1:5, a1 = runif(5), a2 = runif(5), b1 = runif(5), b2 = runif(5))
a = dt[,2:3]
b = dt[,4:5]
c = a * b
names(c) = c("c1", "c2")
cbind(dt, c)

Output

  id          a1        a2        b1        b2          c1         c2
1  1 0.082863389 0.7108292 0.8952547 0.4530363 0.074183837 0.32203140
2  2 0.125423227 0.8957771 0.2231827 0.1042432 0.027992292 0.09337865
3  3 0.278592590 0.9317453 0.7910442 0.3729406 0.220379066 0.34748565
4  4 0.004518196 0.3890797 0.5323291 0.7997701 0.002405167 0.31117430
5  5 0.784290484 0.5499781 0.7429104 0.8106772 0.582657582 0.44585471

CodePudding user response:

Taking @GoldenGateBridge 's answer into account, and assuming you don't know how big your table can be, you could generalize in the form:

set.seed(10)
dt <- data.frame(id = 1:5, 
                 a1 = runif(5),
                 a2 = runif(5),
                 a3 = runif(5),
                 a4 = runif(5),
                 a5 = runif(5),
                 # ....
                 b1 = runif(5), 
                 b2 = runif(5),
                 b3 = runif(5),
                 b4 = runif(5),
                 b5 = runif(5)
                 # ....
)

n <- 1:((ncol(x)-1)/2)
ind_a <- paste0("a", n)
ind_b <- paste0("b", n)
ind_c <- paste0("c", n)

producto <- dt[ind_a]*dt[ind_b]
names(producto) <- ind_c
dt <- cbind(dt, producto)

I hope it is useful.

  • Related