Home > Software engineering >  R apply custom function without for loop, expected output is matrix
R apply custom function without for loop, expected output is matrix

Time:10-11

How can I apply a custom function, with multiple arguments, to return an N x M dataframe without a for loop?

For example:

mat1 <- data.frame(a = 1:5, b = -1)
vec1 <- 100:107

myfcn <- function(x, vals){
  ans <- (x   vals[1]) * vals[2]
  return(ans)
}

df <- data.frame(matrix(nrow = length(mat1), ncol = length(vec1))) # pre-allocate

for (i in 1:length(vec1)){
  for (j in 1:nrow(mat1)){
    result <- myfcn(vec1[i], vals = c(mat1$a[j], mat1$b[j]))
    df[j,i] <- result
    }
  }
  
print(df)

This returns the desired output matrix:

enter image description here

How can I skip the for-loop and use some kind of apply function to get the above output matrix?

I tried sapply(vec1, myfcn, vals = c(mat1$a, mat1$b)), but it returns this:

enter image description here

And I tried outer(vec1, c(mat1$a, mat1$b), myfcn), but it returns this:

enter image description here

CodePudding user response:

You can simply do this:

result = sapply(vec1,\(v) (v mat1$a)*mat1$b)

If you want to use myfcn, make a slight change to it:

myfcn <- function(x, v1,v2) (x   v1)*v2
result = sapply(vec1,myfcn,v1=mat1$a, v2=mat1$b)

Output:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -101 -102 -103 -104 -105 -106 -107 -108
[2,] -102 -103 -104 -105 -106 -107 -108 -109
[3,] -103 -104 -105 -106 -107 -108 -109 -110
[4,] -104 -105 -106 -107 -108 -109 -110 -111
[5,] -105 -106 -107 -108 -109 -110 -111 -112

If you want to convert to a frame with the same names above, add the following:

setNames(as.data.frame(result), paste0("X",1:ncol(result)))

Output:

    X1   X2   X3   X4   X5   X6   X7   X8
1 -101 -102 -103 -104 -105 -106 -107 -108
2 -102 -103 -104 -105 -106 -107 -108 -109
3 -103 -104 -105 -106 -107 -108 -109 -110
4 -104 -105 -106 -107 -108 -109 -110 -111
5 -105 -106 -107 -108 -109 -110 -111 -112

CodePudding user response:

First of all, you do not need a double for loop to populate df, R is vectorized and you can pass the entire vec1 to the function each time through the loop.

mat1 <- data.frame(a = 1:5, b = -1)
vec1 <- 100:107

myfcn <- function(x, vals){
  ans <- (x   vals[1]) * vals[2]
  return(ans)
}

df <- as.data.frame(matrix(nrow = length(mat1), ncol = length(vec1))) # pre-allocate

for (j in 1:nrow(mat1)){
  result <- myfcn(vec1, vals = c(mat1$a[j], mat1$b[j]))
  df[j,] <- result
}
df
#>     V1   V2   V3   V4   V5   V6   V7   V8
#> 1 -101 -102 -103 -104 -105 -106 -107 -108
#> 2 -102 -103 -104 -105 -106 -107 -108 -109
#> 3 -103 -104 -105 -106 -107 -108 -109 -110
#> 4 -104 -105 -106 -107 -108 -109 -110 -111
#> 5 -105 -106 -107 -108 -109 -110 -111 -112

Created on 2022-10-10 with reprex v2.0.2

Now with a sapply loop. You don't even have to pre-allocate space, the loop will do it on its own.

df2 <- t(sapply(1:nrow(mat1), \(j) myfcn(vec1, vals = c(mat1$a[j], mat1$b[j]))))
df2 <- as.data.frame(df2)

identical(df, df2)
#> [1] TRUE

Created on 2022-10-10 with reprex v2.0.2

Note also that if you need a tabular data structure but not specifically a data.frame, then the last as.data.frame is not needed.

CodePudding user response:

You were on the right lines with outer():

outer(mat1$a, vec1, ` `) * mat1$b
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] -101 -102 -103 -104 -105 -106 -107 -108
# [2,] -102 -103 -104 -105 -106 -107 -108 -109
# [3,] -103 -104 -105 -106 -107 -108 -109 -110
# [4,] -104 -105 -106 -107 -108 -109 -110 -111
# [5,] -105 -106 -107 -108 -109 -110 -111 -112

This will also work if mat1$b is -1:-5, for example - not just in cases where it's a constant.

CodePudding user response:

pmap is the most elegant solution in my opinion.

In this particular case, there is an extra step to create the input data with vectors of the same length, but then the apply statement is as simple as df$result <- pmap_dbl(df, foo)

library(tidyverse)

foo <- function(x, a, b){
  ans <- (x   a) * b
  return(ans)
}

# input data
df <- data.frame(a = 1:5, b = -1) %>% expand_grid(x = 100:107)

# pmap will used named arguments if the first argument has names
df$result  <- pmap_dbl(df, foo) 

# if argument has no names, will use them as ordered
df$result2 <- pmap_dbl(with(df, list(x, a, b)), foo) 

print(df, n = Inf)
#> # A tibble: 40 × 5
#>        a     b     x result result2
#>    <int> <dbl> <int>  <dbl>   <dbl>
#>  1     1    -1   100   -101    -101
#>  2     1    -1   101   -102    -102
#>  3     1    -1   102   -103    -103
#>  4     1    -1   103   -104    -104
#>  5     1    -1   104   -105    -105
#>  6     1    -1   105   -106    -106
#>  7     1    -1   106   -107    -107
#>  8     1    -1   107   -108    -108
#>  9     2    -1   100   -102    -102
#> 10     2    -1   101   -103    -103
#> 11     2    -1   102   -104    -104
#> 12     2    -1   103   -105    -105
#> 13     2    -1   104   -106    -106
#> 14     2    -1   105   -107    -107
#> 15     2    -1   106   -108    -108
#> 16     2    -1   107   -109    -109
#> 17     3    -1   100   -103    -103
#> 18     3    -1   101   -104    -104
#> 19     3    -1   102   -105    -105
#> 20     3    -1   103   -106    -106
#> 21     3    -1   104   -107    -107
#> 22     3    -1   105   -108    -108
#> 23     3    -1   106   -109    -109
#> 24     3    -1   107   -110    -110
#> 25     4    -1   100   -104    -104
#> 26     4    -1   101   -105    -105
#> 27     4    -1   102   -106    -106
#> 28     4    -1   103   -107    -107
#> 29     4    -1   104   -108    -108
#> 30     4    -1   105   -109    -109
#> 31     4    -1   106   -110    -110
#> 32     4    -1   107   -111    -111
#> 33     5    -1   100   -105    -105
#> 34     5    -1   101   -106    -106
#> 35     5    -1   102   -107    -107
#> 36     5    -1   103   -108    -108
#> 37     5    -1   104   -109    -109
#> 38     5    -1   105   -110    -110
#> 39     5    -1   106   -111    -111
#> 40     5    -1   107   -112    -112

Created on 2022-10-10 with reprex v2.0.2

  • Related