I am doing binary classification for 1000 molecules with smiles as input. My dataset is from <moleculenet.org>, Biophysics HIV data. I first tokenized them, padded them.
data = slice(1000)
data1 = df[data]
tokenizer = tf.keras.preprocessing.text.Tokenizer(
vocab_size, filters="", char_level=True)
tokenizer.fit_on_texts(data1.smiles)
seqs = tokenizer.texts_to_sequences(data1.smiles)
padded_seqs = tf.keras.preprocessing.sequence.pad_sequences(seqs, padding="post")
Then i built dense models and took cross entropy as loss function.
model = Sequential()
model.add(Dense(60, input_dim=60, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn= lambda: model, epochs=100,
batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline,padded_seqs,data1.HIV_active, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
My error log is the following
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: DeprecationWarning:
KerasClassifier is deprecated, use Sci-Keras (https://github.com/adriangb/scikeras)
instead.
This is separate from the ipykernel package so we can avoid doing imports until
Standardized: nan% (nan%)
/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py:372:
FitFailedWarning:
10 fits failed out of a total of 10.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting
error_score='raise'.
Below are more details about the failures:
10 fits failed with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py",
line 681, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py", line 394, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/usr/local/lib/python3.7/dist-packages/keras/wrappers/scikit_learn.py", line 232, in fit
return super(KerasClassifier, self).fit(x, y, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/wrappers/scikit_learn.py", line 164, in fit
history = self.model.fit(x, y, **fit_args)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 878, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 867, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 860, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 808, in train_step
y_pred = self(x, training=True)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.7/dist-packages/keras/engine/input_spec.py", line 263, in assert_input_compatibility
raise ValueError(f'Input {input_index} of layer "{layer_name}" is '
ValueError: Input 0 of layer "sequential_3" is incompatible with the layer: expected shape=(None, 60), found shape=(5, 174)
warnings.warn(some_fits_failed_message, FitFailedWarning)
ValueError: is the main error.I am sorry for long code. Please tell me if my question is understandable or worded wrongly.
CodePudding user response:
The shape of your input padded_seqs.shape
is (1000, 174)
, so input shape should be 174, use this:
model.add(tf.keras.layers.Dense(60, input_dim=(174), activation='relu'))