sklearn: print DecisionTreeRegressor's tree from IterativeImputer-CodePudding

I have an IterativeImputer that uses DecisionTreeRegressor as estimator and I want to print it's tree with export_text method:

import pandas as pd
from sklearn import tree
from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer
from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(criterion="squared_error", 
                                  max_depth=None, 
                                  min_samples_split=2,
                                  min_samples_leaf=1, 
                                  random_state=0)
iterative_imputer = IterativeImputer(
    estimator=regressor,
    sample_posterior=False,
    max_iter=10,
    initial_strategy='mean',
    imputation_order='roman',
    verbose=2,
    random_state=0)
iterative_imputer.fit(df)
print(tree.export_text(iterative_imputer.estimator))

But I'm getting an error:

sklearn.exceptions.NotFittedError: This DecisionTreeRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

What am I doing wrong?

CodePudding user response：

here's how I would have done this:

from sklearn.tree import DecisionTreeRegressor from sklearn.impute import IterativeImputer

##Create a DecisionTreeRegressor object estimator = DecisionTreeRegressor()

##Create an IterativeImputer object using the DecisionTreeRegressor as the estimator imputer = IterativeImputer(estimator=estimator)

##Fit the IterativeImputer to your data imputer.fit(X)

##Print the tree of the DecisionTreeRegressor used as the estimator for the IterativeImputer print(estimator.export_text())

The export_text method will output the tree as a text representation, which you can then print to the console or save to a file.

Keep in mind that the IterativeImputer uses the DecisionTreeRegressor object as an estimator to predict missing values in the data, and the tree generated by the DecisionTreeRegressor is not directly related to the imputation process.

CodePudding user response：

The error occurs because the iterative_imputer.estimator object is cloned before being fit in each iteration. It is the instance that all other estimators come from.

After fitting, the estimators are stored as as list of _ImputerTriplet objects under the imputation_sequence_ attribute. They can be accessed (scikit-learn==1.2.0) with:

import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import export_text

regressor = DecisionTreeRegressor(random_state=0)
iterative_imputer = IterativeImputer(
    estimator=regressor,
    max_iter=10,
    imputation_order='roman',
    random_state=0,
)

iterative_imputer.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])

for _, _, estimator in iterative_imputer.imputation_sequence_:
    print(export_text(estimator))

|--- feature_1 <= 7.50
|   |--- feature_0 <= 2.75
|   |   |--- value: [7.00]
|   |--- feature_0 >  2.75
|   |   |--- value: [4.00]
|--- feature_1 >  7.50
|   |--- value: [10.00]

...