my dataset consist of 10 feature (10 columns) for input and the last 3 columns for 3 different output. If I use one column for output, for example y = newDf.iloc[:, 10].values , it works; but if I use all 3 columns it gives me an error at pipe_lr.fit and says: y should be a 1d array, got an array of shape (852, 3) instead. How can I pass y ?
X = newDf.iloc[:, 0:10].values
y = newDf.iloc[:, 10:13].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
pipe_lr = make_pipeline(StandardScaler(),
PCA(n_components=2),
LogisticRegression(random_state=1, solver='lbfgs'))
pipe_lr.fit(X_train, y_train)
CodePudding user response:
The pipeline
itself does not care about the format for y
, it just hands it over to each step. In your case, it's the LogisticRegression
, which indeed is not set up for multi-label classification. You can manage it using the MultiOutputClassification
wrapper:
pipe_lr = make_pipeline(
StandardScaler(),
PCA(n_components=2),
MultiOutputClassifier(LogisticRegression(random_state=1, solver='lbfgs'))
)
(There is also a MultiOutputRegressor
, and more-complicated things like ClassifierChain
and RegressorChain
. See the User Guide. However, there is not to my knowledge a builtin way to mix and match regression and classification tasks.)
CodePudding user response:
Simply put, No. What you want is called multi-label learning, not supported by Scikit-Learn.
You should train three models, each having a label.