I would like to estimate a regression_forest with the grf package and remove the original data that is stored in the regression_forest output for data protection reasons.
The problem is that when I remove the data, R doesn't recognize the object as a regression_forest anymore and therefore throws an error.
Does anyone know how to go around this problem?
Here is a reproducible example:
library(grf)
# Train a standard regression forest.
n <- 50
p <- 10
X <- matrix(rnorm(n * p), n, p)
Y <- X[, 1] * rnorm(n)
r.forest <- regression_forest(X, Y)
# Remove the original data
r.forest <- r.forest[-c(18,19)]
# Predict using the forest.
X.test <- matrix(0, 101, p)
X.test[, 1] <- seq(-2, 2, length.out = 101)
r.pred <- predict(r.forest, X.test)
The last line causes the following error:
Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "list"
CodePudding user response:
The predict
function seems to need to know the dimensions of the original data, but from what I can tell it doesn't need the data itself.
If you convert the original data stored in the model object to NA, then the predictions seem unaffected.
# Get original predictions
r.pred.original <- predict(r.forest, X.test)
# Convert stored data to NA
r.forest$X.orig[!is.na(r.forest$X.orig)] <- NA
# Get new predictions
r.pred.new <- predict(r.forest, X.test)
# r.pred.original and r.pred.new are the same