Model input shape error: Nested array from keras data generator

Creating a custom keras data generator to train my model

see more on creating a custom data generator here: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

I have the getitem function defined as such:

    def __getitem__(self, idx):
        classes = self.classes
        shape = self.target_shape
        nbframe = self.nbframe

        batchImages = []
        batchLabels = []

        indexes = self.vid_info[idx*self.batch_size:(idx 1)*self.batch_size]

        print("INDEXES GET ITEM",len(indexes))
        # for all windows of a single file 


        # for each file 
        for i in indexes:

            fileLabels = []
            fileImages = []
            # for each window in each file
            for x in i:
                vid = x
                folderPath = vid.get('name')
                classname = self._get_classname(folderPath)

                # create a label array and set 1 to the right column
                label = np.zeros(len(classes))
                col = classes.index(classname)
                label[col] = 1.

                video_id = vid['id']
                frame_indexes = vid['frames']
                total_frames = vid['frame_count']
                window_images = vid['images']
                # append frames
                fileLabels.append(label)
                fileImages.append(window_images)
            print("BATCH fileLabels SHAPE",np.array(fileLabels).shape)
            print("BATCH fileImages SHAPE",np.array(fileImages).shape)
            batchLabels.append(fileLabels)
            batchImages.append(fileImages)
            print("BATCH LABELS SHAPE",np.array(batchLabels).shape)
            print("BATCH IMAGES SHAPE",np.array(batchImages).shape)




        batchImages,batchLabels = np.array(batchImages, dtype=object), np.array(batchLabels, dtype=object)
        print("OUTER SHAPES",batchImages.shape,batchLabels.shape)
        batchImages = np.asarray(batchImages).astype(np.float32)
        batchLabels = np.asarray(batchLabels).astype(np.float32)

        return batchImages, batchLabels

The traceback error I am constantly receiving is the following

INDEXES GET ITEM 2
BATCH fileLabels SHAPE (42, 2)
BATCH fileImages SHAPE (42, 10, 128, 128, 2)
BATCH LABELS SHAPE (1, 42, 2)
BATCH IMAGES SHAPE (1, 42, 10, 128, 128, 2)
BATCH fileLabels SHAPE (8, 2)
BATCH fileImages SHAPE (8, 10, 128, 128, 2)
/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py:178: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  print("BATCH LABELS SHAPE",np.array(batchLabels).shape)
BATCH LABELS SHAPE (2,)
BATCH IMAGES SHAPE (2,)
/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py:179: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  print("BATCH IMAGES SHAPE",np.array(batchImages).shape)
OUTER SHAPES (2,) (2,)
TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "experiment8.1.py", line 213, in <module>
    history = model.fit(
  File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/wandb/integration/keras/keras.py", line 150, in new_v2
    return old_v2(*args, **kwargs)
  File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1045, in fit
    data_handler = data_adapter.DataHandler(
  File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1100, in __init__
    self._adapter = adapter_cls(
  File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 902, in __init__
    super(KerasSequenceAdapter, self).__init__(
  File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 779, in __init__
    peek, x = self._peek_and_restore(x)
  File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 913, in _peek_and_restore
    return x[0], x
  File "/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py", line 186, in __getitem__
    batchImages = np.asarray(batchImages).astype(np.float32)
ValueError: setting an array element with a sequence.

I believe the error is occurring because I am appending a multi-dimensional array to batchImages & batchLabels

first iteration

specifically the following print statements shown on the first iteration displays

INDEXES GET ITEM 2
BATCH fileLabels SHAPE (42, 2)
BATCH fileImages SHAPE (42, 10, 128, 128, 2)
BATCH LABELS SHAPE (1, 42, 2)
BATCH IMAGES SHAPE (1, 42, 10, 128, 128, 2)

second iteration

then on the second iteration you can see that the shape of batchLabels and batchImages changes to (2,)

BATCH fileLabels SHAPE (8, 2)
BATCH fileImages SHAPE (8, 10, 128, 128, 2)
/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py:178: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  print("BATCH LABELS SHAPE",np.array(batchLabels).shape)
BATCH LABELS SHAPE (2,)
BATCH IMAGES SHAPE (2,)

I am unsure of how to properly return this data for the tensorflow neural network model training

EDIT:

here I found a link describing I am having this error because I am trying to convert a tuple to a numpy array https://stackoverflow.com/a/47482672/7989522

I still dont understand how to move forward, because if I remove the lines

batchImages = np.asarray(batchImages).astype(np.float32) batchLabels = np.asarray(batchLabels).astype(np.float32)

Then I get the error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

CodePudding user response：

I just made batchsize = 1 and it works. That still is not an optimal solution because I literally cant add more samples to my batch, can only do 1 sample at a time so my GPU optimization suffers now.

followed this reference: "simply set the input sequence for the LSTM to (None, features) and use batch_size as 1."

Train and predict on variable length sequences

It does work however, so at least now I can move forward with creating the rest of the model.