Home > database >  How to use custom function with Apache Arrow in R?
How to use custom function with Apache Arrow in R?

Time:03-19

I am trying to learn Apache Arrow with R. I can not find how to make user defined function with Arrow.

library(arrow)
#> See arrow_info() for available features
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Simply return the average of a vector

f1 <- function(x) {
  
  x <- Array$create(x)
  
  res <- mean(x, na.rm = TRUE)
  
  return(as.vector(res))
}

If I try to use my f1 function, I am getting this warning and the result is that the data is pulled in R before the computation.

ds <- arrow_table(head(mtcars, 6))

ds %>% 
  mutate(mpg2 = f1(mpg)) %>% 
  collect()
#> Warning: Expression f1(mpg) not supported in Arrow; pulling data into R
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg2
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 20.5
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 20.5
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 20.5
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 20.5
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 20.5
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 20.5

Are there any way to use custom function within Arrow in R?

Created on 2022-03-18 by the reprex package (v2.0.1)

CodePudding user response:

That appears to be the documented behaviour:

If you try to call a function which does not have arrow mapping, the data will be pulled back into R, and you will see a warning message.

Which makes some sense if you think about it as the 'backend' does not contain an embedded R interpreter so we probably cannot expect to send arbitrary functions down.

  • Related