I am trying to learn Apache Arrow with R. I can not find how to make user defined function with Arrow.
library(arrow)
#> See arrow_info() for available features
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Simply return the average of a vector
f1 <- function(x) {
x <- Array$create(x)
res <- mean(x, na.rm = TRUE)
return(as.vector(res))
}
If I try to use my f1
function, I am getting this warning and the result
is that the data is pulled in R before the computation.
ds <- arrow_table(head(mtcars, 6))
ds %>%
mutate(mpg2 = f1(mpg)) %>%
collect()
#> Warning: Expression f1(mpg) not supported in Arrow; pulling data into R
#> mpg cyl disp hp drat wt qsec vs am gear carb mpg2
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 20.5
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 20.5
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 20.5
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.5
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 20.5
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 20.5
Are there any way to use custom function within Arrow in R?
Created on 2022-03-18 by the reprex package (v2.0.1)
CodePudding user response:
That appears to be the documented behaviour:
If you try to call a function which does not have arrow mapping, the data will be pulled back into R, and you will see a warning message.
Which makes some sense if you think about it as the 'backend' does not contain an embedded R interpreter so we probably cannot expect to send arbitrary functions down.