Home > Software design >  How can I improve a simple subtraction in a for loop in R?
How can I improve a simple subtraction in a for loop in R?

Time:01-19

I want to substract a vector (S_0) from every row of a matrix (S_t). Unfortunately, calculating my for loop takes so much time as the number of rows is 1 million.

i <- 1
n <- 1000000

X_t <- data.frame(matrix(0, nrow = n, ncol = 10))

for (i in i:n) {
  X_t[i,] <- S_t[i, ] - S_0 
}

S_0 is a vector of length 10

S_t is a data frame of dimension n x 10 containing values from prior calculations

My first idea was to transform S_0 into a matrix of dimension n x 10 (all rows are identical then). Maybe it's faster to substract a matrix from a matrix? Unfortunately, I could not find out how to do this effeciently without using another for loop.

Furthermore, I tried this:

data.frame(matrix(S_0, nrow = n, ncol = 10))

but the output was not what I expected as the order of the numbers was mixed up within every row.

CodePudding user response:

You can use col to transpose the vector and keepe the type of S_t

X_t <- S_t - S_0[col(S_t)]
S_0 <- 1:10
S_t <- data.frame(matrix(0, nrow = 5, ncol = 10))

X_t <- S_t - S_0[col(S_t)]

X_t
#  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#1 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
#2 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
#3 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
#4 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
#5 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

str(X_t)
#'data.frame':   5 obs. of  10 variables:
# $ X1 : num  -1 -1 -1 -1 -1
# $ X2 : num  -2 -2 -2 -2 -2
# $ X3 : num  -3 -3 -3 -3 -3
# $ X4 : num  -4 -4 -4 -4 -4
# $ X5 : num  -5 -5 -5 -5 -5
# $ X6 : num  -6 -6 -6 -6 -6
# $ X7 : num  -7 -7 -7 -7 -7
# $ X8 : num  -8 -8 -8 -8 -8
# $ X9 : num  -9 -9 -9 -9 -9
# $ X10: num  -10 -10 -10 -10 -10
S_t <- matrix(0, nrow = 5, ncol = 10)
X_t <- S_t - S_0[col(S_t)]
str(X_t)
# num [1:5, 1:10] -1 -1 -1 -1 -1 -2 -2 -2 -2 -2 ...

Another option is using sweep, also keeping the type.

sweep(S_t, 2, S_0)

CodePudding user response:

You can use t twice:

S_t <- data.frame(matrix(0, nrow = 1000000, ncol = 10))
S_0 <- 1:10

X_t <- t(t(S_t) - S_0)

# > head(X_t)
#      X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# [1,] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
# [2,] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
# [3,] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
# [4,] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
# [5,] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
# [6,] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

Benchmark: t is the fastest

bench::mark(t(t(S_t) - S_0),
            S_t - S_0[col(S_t)], 
            sweep(S_t, 2, S_0),
            check = FALSE, iterations = 10)
#  expression               min  median itr/s…¹ mem_a…² gc/se…³ n_itr
#1 t(t(S_t) - S_0)        211ms   321ms    3.10   229MB    2.17    10
#2 S_t - S_0[col(S_t)]    691ms   874ms    1.13   509MB    1.82    10
#3 sweep(S_t, 2, S_0)     638ms   735ms    1.34   548MB    2.54    10
  • Related