Home > database >  Remove original data from regression_forest without changing the type to "list"
Remove original data from regression_forest without changing the type to "list"

Time:07-02

I would like to estimate a regression_forest with the grf package and remove the original data that is stored in the regression_forest output for data protection reasons.

The problem is that when I remove the data, R doesn't recognize the object as a regression_forest anymore and therefore throws an error.

Does anyone know how to go around this problem?

Here is a reproducible example:

library(grf)

# Train a standard regression forest.
n <- 50
p <- 10
X <- matrix(rnorm(n * p), n, p)
Y <- X[, 1] * rnorm(n)
r.forest <- regression_forest(X, Y)

# Remove the original data
r.forest <- r.forest[-c(18,19)]

# Predict using the forest.
X.test <- matrix(0, 101, p)
X.test[, 1] <- seq(-2, 2, length.out = 101)
r.pred <- predict(r.forest, X.test)

The last line causes the following error:

Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "list"

CodePudding user response:

The predict function seems to need to know the dimensions of the original data, but from what I can tell it doesn't need the data itself.

If you convert the original data stored in the model object to NA, then the predictions seem unaffected.

# Get original predictions
r.pred.original <- predict(r.forest, X.test)

# Convert stored data to NA
r.forest$X.orig[!is.na(r.forest$X.orig)] <- NA

# Get new predictions
r.pred.new <- predict(r.forest, X.test)

# r.pred.original and r.pred.new are the same
  • Related