ValueError: Found input variables with inconsistent numbers of samples: [4, 304]-CodePudding

i tried to make a confusion matrix from the model that i make, all seems fine till making the model until i approach a error that says

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-23313a51ba02> in <module>()
      3 y_test=np.argmax(Y_test, axis=0)
      4 from sklearn.metrics import confusion_matrix
----> 5 confusion_matrix(y_test, y_pred)
      6 import seaborn as sns
      7 import matplotlib.pyplot as plt

2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    300 
    301     """
--> 302     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    303     if y_type not in ("binary", "multiclass"):
    304         raise ValueError("%s is not supported" % y_type)

/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     82     y_pred : array or indicator matrix
     83     """
---> 84     check_consistent_length(y_true, y_pred)
     85     type_true = type_of_target(y_true)
     86     type_pred = type_of_target(y_pred)

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    331         raise ValueError(
    332             "Found input variables with inconsistent numbers of samples: %r"
--> 333             % [int(l) for l in lengths]
    334         )
    335 

ValueError: Found input variables with inconsistent numbers of samples: [4, 304]

here are the code that i use

# Convert List to numpy array, for Keras use
Train_label = np.eye(n_labels)[label] # One-hot encoding by np array function
Train_data = np.array(data)
print("Dataset shape is",Train_data.shape, "(size, timestep, column, row, channel)")
print("Label shape is",Train_label.shape,"(size, label onehot vector)")
# shuffling dataset for input fit function
# if don`t, can`t train model entirely
x = np.arange(Train_label.shape[0])
np.random.shuffle(x)
# same order shuffle is needed
Train_label = Train_label[x]
Train_data = Train_data[x]

train_size = 0.9
X_train=Train_data[:int(Totalnb * 0.9),:]
Y_train=Train_label[:int(Totalnb * 0.9)]
X_test=Train_data[int(Totalnb * 0.1):,:]
Y_test=Train_label[int(Totalnb * 0.1):]
# 2. Buliding a Model
# declare input layer for CNN LSTM architecture
video = Input(shape=(timesteps,img_col,img_row,img_channel))
STEPS_PER_EPOCH = 120
#AlexNet Layer
model = tf.keras.models.Sequential([
    # 1st conv
  tf.keras.layers.Conv2D(96, (11,11),strides=(4,4), activation='relu', input_shape=(img_col, img_row, img_channel)),
  tf.keras.layers.BatchNormalization(),
  tf.keras.layers.MaxPooling2D(2, strides=(2,2)),
    # 2nd conv
  tf.keras.layers.Conv2D(256, (5,5),strides=(1,1), activation='relu',padding="same"),
  tf.keras.layers.BatchNormalization(),
  tf.keras.layers.MaxPooling2D(2, strides=(2,2)),
     # 3rd conv
  tf.keras.layers.Conv2D(384, (3,3),strides=(1,1), activation='relu',padding="same"),
  tf.keras.layers.BatchNormalization(),
    # 4th conv
  tf.keras.layers.Conv2D(384, (3,3),strides=(1,1), activation='relu',padding="same"),
  tf.keras.layers.BatchNormalization(),
    # 5th Conv
  tf.keras.layers.Conv2D(256, (3, 3), strides=(1, 1), activation='relu',padding="same"),
  tf.keras.layers.BatchNormalization(),
  tf.keras.layers.MaxPooling2D(2, strides=(2,2)),
])
model.trainable = True
# FC Dense Layer
x = model.output
x = Flatten()(x)
cnn_out = Dense(128)(x)
# Construct CNN model 
Lstm_inp = Model(model.input, cnn_out)
# Distribute CNN output by timesteps 
encoded_frames = TimeDistributed(Lstm_inp)(video)
# Contruct LSTM model 
encoded_sequence = LSTM(256)(encoded_frames)
hidden_Drop = Dropout(0.2)(encoded_sequence)
hidden_layer = Dense(128)(hidden_Drop)
outputs = Dense(n_labels, activation="softmax")(hidden_layer)
# Contruct CNN LSTM model 
model = Model([video], outputs)
# 3. Setting up the Model Learning Process
# Model Compile 
opt = SGD(lr=0.01)
model.compile(loss = "categorical_crossentropy", optimizer = opt, metrics=['accuracy'])
model.summary()
# 4. Training the Model
hist = model.fit(X_train, Y_train, batch_size=batch_size, validation_split=validation_ratio, shuffle=True, epochs=num_epochs)

Y_pred2 = model.predict(X_test)
y_pred= np.argmax(Y_pred2, axis=1) # prediksi
y_test=np.argmax(Y_test, axis=0)
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred) 
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(8,5))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt=".0f", ax=ax)
plt.xlabel("Y_head")
plt.ylabel("Y_true")
plt.show()
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

everything seems fine and work but the error come out when i try to make the confussion matrix in the line confusion_matrix(y_test, y_pred)

i still cant figure what might be the problem

hope anyone can help me with this

thank you so much guys

CodePudding user response：

Posting my comments as answer for completeness:

One possible thing that looks a bit weird is that you take different axis when calculating the argmax for y_pred and y_test. But that might be ok depending on your data layout.

y_test and y_pred seem be be of different lengths. Can you check the shapes of Y_pred2 and Y_test and see if the axes over which you calculate the argmax are correct.