Home > database >  Got "MinMaxScaler is expecting 17625 features as input." error while preprocessing dataset
Got "MinMaxScaler is expecting 17625 features as input." error while preprocessing dataset

Time:12-20

I'm trying to preprocess a large gene data set in order to predict some targets.

After splitting into train and test. I removed all features that had over 25% 0's across all rows in the X train, then I attempted minmax scaler, but I keep receiving the error " X has 23146 features, but MinMaxScaler is expecting 17625 features as input."

If I skip the filtering step the feature numbers would be the same but my model will be inaccurate.

X_train= X_train.loc[:, (X_train != 0).any(axis=0)]
X_train = X_train.loc[:, (X_train==0).mean() < .25]

mm = MinMaxScaler()
X_train_scaled = mm.fit_transform(X_train)
X_test_scaled = mm.transform(X_test)

This is my code so far. I'm very new to Machine Learning.

CodePudding user response:

It's pretty clear that you need to perform the filtering on your test columns too. Maybe it would be better to split the indexing and make it separate so that you can apply it to train and test data jointly. I've done this in the following code by ORing (|) the two conditions (although I must note that the second condition will never alter the output because it's a subset of the first condition):

kept_feats_inds = (X_train != 0).any(axis=0) | \
                  ((X_train==0).mean(axis=0) < .25)
X_train= X_train.loc[:, kept_feats_inds]
X_test = X_test.loc[:, kept_feats_inds]

mm = MinMaxScaler()
X_train_scaled = mm.fit_transform(X_train)
X_test_scaled = mm.transform(X_test)
  • Related