How can I transform with scikit-learn Pipeline when the last estimator is not a transformer?-CodePudding

I have a pipeline and I would like to perform the preprocessing and feature engineering steps, but I cannot use fit_transform() since RandomForestClassifier() does not have such a method.

I've tried using the _fit() method of the pipeline (as this is what the fit() method uses) but this is giving me KeyError in my transformers.

Here's the pipeline below:

# pipeline transformations
_pipe = Pipeline(
    [
        (
            "most_frequent_imputer",
            MostFrequentImputer(features=config.model_config.impute_most_freq_cols),
        ),
        (
            "aggregate_high_cardinality_features",
            AggregateCategorical(features=config.model_config.high_cardinality_cats),
        ),
        (
            "get_categorical_codes",
            CategoryConverter(features=config.model_config.convert_to_category_codes),
        ),
        (
            "mean_imputer",
            MeanImputer(features=config.model_config.continuous_features),
        ),
        (
            "random_forest",
            RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=25),
        ),
    ]
)

CodePudding user response：

You can do the following:

_pipe[:-1].fit_transform(X)

This will basically select all the steps except the last one so you can perform fit_transform(). It is important to note that the preprocessing steps will be fitted.