I have a pipeline and I would like to perform the preprocessing and feature engineering steps, but I cannot use fit_transform()
since RandomForestClassifier()
does not have such a method.
I've tried using the _fit()
method of the pipeline (as this is what the fit()
method uses) but this is giving me KeyError in my transformers.
Here's the pipeline below:
# pipeline transformations
_pipe = Pipeline(
[
(
"most_frequent_imputer",
MostFrequentImputer(features=config.model_config.impute_most_freq_cols),
),
(
"aggregate_high_cardinality_features",
AggregateCategorical(features=config.model_config.high_cardinality_cats),
),
(
"get_categorical_codes",
CategoryConverter(features=config.model_config.convert_to_category_codes),
),
(
"mean_imputer",
MeanImputer(features=config.model_config.continuous_features),
),
(
"random_forest",
RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=25),
),
]
)
CodePudding user response:
You can do the following:
_pipe[:-1].fit_transform(X)
This will basically select all the steps except the last one so you can perform fit_transform()
. It is important to note that the preprocessing steps will be fitted.