Keras ImageDataGenerator with flow, Got ValueError: `x` (images tensor) and `y` (labels) should have-CodePudding

I'm working out the CNN course on Coursera, and as I try to solve the assignment notebook, this is the error thrown at me.

ValueError: `x` (images tensor) and `y` (labels) should have the same length. Found: x.shape = (27455, 28, 28, 1), y.shape = (7172, 28, 28, 1)

But aren't they of the same length? (the dimension that is.) The following is my code block that is creating the issue:

# GRADED FUNCTION: train_val_generators
def train_val_generators(training_images, training_labels, validation_images, validation_labels):
  """
  Creates the training and validation data generators
  
  Args:
    training_images (array): parsed images from the train CSV file
    training_labels (array): parsed labels from the train CSV file
    validation_images (array): parsed images from the test CSV file
    validation_labels (array): parsed labels from the test CSV file
    
  Returns:
    train_generator, validation_generator - tuple containing the generators
  """
  ### START CODE HERE

  # In this section, you will have to add another dimension to the data
  # So, for example, if your array is (10000, 28, 28)
  # You will need to make it (10000, 28, 28, 1)
  # Hint: np.expand_dims

  training_images = np.expand_dims(training_images, axis=3)
  validation_images = np.expand_dims(validation_images, axis=3)

  print(training_images.shape)
  print(validation_images.shape)

  

  # Instantiate the ImageDataGenerator class 
  # Don't forget to normalize pixel values 
  # and set arguments to augment the images (if desired)
  train_datagen = ImageDataGenerator(
    # Your Code Here
    rescale = 1./255,
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
    )


  # Pass in the appropriate arguments to the flow method
  train_generator = train_datagen.flow(x=training_images,
                                       y=validation_images,
                                       batch_size=32) 

  
  # Instantiate the ImageDataGenerator class (don't forget to set the rescale argument)
  # Remember that validation data should not be augmented
  validation_datagen = ImageDataGenerator(
      rescale = 1./255
  )

  # Pass in the appropriate arguments to the flow method
  validation_generator = validation_datagen.flow(x=training_images,
                                                 y=validation_images,
                                                 batch_size=32) 

  ### END CODE HERE

  return train_generator, validation_generator

After running this cell, it's working fine and adding an extra dimension to my images. The following code cell raises the above-mentioned issue.

# Test your generators
train_generator, validation_generator = train_val_generators(training_images, training_labels, validation_images, validation_labels)

print(f"Images of training generator have shape: {train_generator.x.shape}")
print(f"Labels of training generator have shape: {train_generator.y.shape}")
print(f"Images of validation generator have shape: {validation_generator.x.shape}")
print(f"Labels of validation generator have shape: {validation_generator.y.shape}")

This is my entire error messasge:

(27455, 28, 28, 1)
(7172, 28, 28, 1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-c93bf4854fbc> in <module>()
      1 # Test your generators
----> 2 train_generator, validation_generator = train_val_generators(training_images, training_labels, validation_images, validation_labels)
      3 
      4 print(f"Images of training generator have shape: {train_generator.x.shape}")
      5 print(f"Labels of training generator have shape: {train_generator.y.shape}")

3 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/image/numpy_array_iterator.py in __init__(self, x, y, image_data_generator, batch_size, shuffle, sample_weight, seed, data_format, save_to_dir, save_prefix, save_format, subset, dtype)
     87                              'should have the same length. '
     88                              'Found: x.shape = %s, y.shape = %s' %
---> 89                              (np.asarray(x).shape, np.asarray(y).shape))
     90         if sample_weight is not None and len(x) != len(sample_weight):
     91             raise ValueError('`x` (images tensor) and `sample_weight` '

ValueError: `x` (images tensor) and `y` (labels) should have the same length. Found: x.shape = (27455, 28, 28, 1), y.shape = (7172, 28, 28, 1)

I tried searching many similar problems in StackOverflow, which talked about changing its dimensions, but my dimensions are correct I think because changing it did absolutely nothing. Any insights on this? Please help. Thanks! :_)

CodePudding user response：

In the train_datagen.flow and validation_datagen.flow, you make two small mistakes. For the y parameter, you pass validation_images, but you need to pass training_labels and validation_labels.

I correct the above mistakes and write full code with random images and a simple CNN model and fit it.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf
import numpy as np


def train_val_generators(training_images, training_labels, validation_images, validation_labels):
  training_images = np.expand_dims(training_images, axis=3)
  validation_images = np.expand_dims(validation_images, axis=3)

  print(training_images.shape)
  print(validation_images.shape)

  train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
    )

  train_generator = train_datagen.flow(x=training_images,
                                       y=training_labels,
                                       batch_size=32) 

  
  validation_datagen = ImageDataGenerator(
      rescale = 1./255
  )

  validation_generator = validation_datagen.flow(x=validation_images,
                                                 y=validation_labels,
                                                 batch_size=32) 

  return train_generator, validation_generator


train_generator, validation_generator = train_val_generators(
    training_images = np.random.rand(27455, 28,28), 
    training_labels = np.random.randint(0,2,27455),
    validation_images = np.random.rand(7172, 28,28),
    validation_labels = np.random.randint(0,2,7172),
    )

model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dropout(0.4),
  tf.keras.layers.Dense(2)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))


model.fit(train_generator,
          epochs=2,
          validation_data=validation_generator)

Output:

(27455, 28, 28, 1)
(7172, 28, 28, 1)
Epoch 1/2
858/858 [==============================] - 25s 25ms/step - loss: 0.6933 - val_loss: 0.6931
Epoch 2/2
858/858 [==============================] - 18s 21ms/step - loss: 0.6932 - val_loss: 0.6930