I was wondering for a machine learning project: is it possible to implement RandomForestRegressor
inside a pipeline?
Specifically, I need to determine the OOB score from a RandomForestRegressor
. But my data requires a lot of preprocessing.
I tried several things, and this is the closest so far:
# Creation of the pipeline
rand_piped = Pipeline([
('preprocessor', preprocessor),
('model', RandomForestRegressor(max_depth=3, random_state=0, oob_score=True))
])
# Fitting our model
rand_piped.fit(df_X_train,df_Y_train.values.ravel())
# Getting our metrics and predictions
oob_score = rand_piped.oob_score_
At the moment I think my problem is that I still have an unclear idea of this method. So feel free to correct me. It returns this error:
Traceback (most recent call last):
File "/home/user/my_rf.py", line 15, in <module>
oob_score = rand_piped.oob_score_
AttributeError: 'Pipeline' object has no attribute 'oob_score_'
CodePudding user response:
Pipelines are subscriptable, so you can look up the oob_score_
in the model
step:
>>> rand_piped["model"].oob_score_
0.9297212997034854