Creating a custom keras data generator to train my model
see more on creating a custom data generator here: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
I have the getitem function defined as such:
def __getitem__(self, idx):
classes = self.classes
shape = self.target_shape
nbframe = self.nbframe
batchImages = []
batchLabels = []
indexes = self.vid_info[idx*self.batch_size:(idx 1)*self.batch_size]
print("INDEXES GET ITEM",len(indexes))
# for all windows of a single file
# for each file
for i in indexes:
fileLabels = []
fileImages = []
# for each window in each file
for x in i:
vid = x
folderPath = vid.get('name')
classname = self._get_classname(folderPath)
# create a label array and set 1 to the right column
label = np.zeros(len(classes))
col = classes.index(classname)
label[col] = 1.
video_id = vid['id']
frame_indexes = vid['frames']
total_frames = vid['frame_count']
window_images = vid['images']
# append frames
fileLabels.append(label)
fileImages.append(window_images)
print("BATCH fileLabels SHAPE",np.array(fileLabels).shape)
print("BATCH fileImages SHAPE",np.array(fileImages).shape)
batchLabels.append(fileLabels)
batchImages.append(fileImages)
print("BATCH LABELS SHAPE",np.array(batchLabels).shape)
print("BATCH IMAGES SHAPE",np.array(batchImages).shape)
batchImages,batchLabels = np.array(batchImages, dtype=object), np.array(batchLabels, dtype=object)
print("OUTER SHAPES",batchImages.shape,batchLabels.shape)
batchImages = np.asarray(batchImages).astype(np.float32)
batchLabels = np.asarray(batchLabels).astype(np.float32)
return batchImages, batchLabels
The traceback error I am constantly receiving is the following
INDEXES GET ITEM 2
BATCH fileLabels SHAPE (42, 2)
BATCH fileImages SHAPE (42, 10, 128, 128, 2)
BATCH LABELS SHAPE (1, 42, 2)
BATCH IMAGES SHAPE (1, 42, 10, 128, 128, 2)
BATCH fileLabels SHAPE (8, 2)
BATCH fileImages SHAPE (8, 10, 128, 128, 2)
/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py:178: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
print("BATCH LABELS SHAPE",np.array(batchLabels).shape)
BATCH LABELS SHAPE (2,)
BATCH IMAGES SHAPE (2,)
/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py:179: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
print("BATCH IMAGES SHAPE",np.array(batchImages).shape)
OUTER SHAPES (2,) (2,)
TypeError: float() argument must be a string or a number, not 'list'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "experiment8.1.py", line 213, in <module>
history = model.fit(
File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/wandb/integration/keras/keras.py", line 150, in new_v2
return old_v2(*args, **kwargs)
File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1045, in fit
data_handler = data_adapter.DataHandler(
File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1100, in __init__
self._adapter = adapter_cls(
File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 902, in __init__
super(KerasSequenceAdapter, self).__init__(
File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 779, in __init__
peek, x = self._peek_and_restore(x)
File "/opt/conda/envs/nnml38-2/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 913, in _peek_and_restore
return x[0], x
File "/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py", line 186, in __getitem__
batchImages = np.asarray(batchImages).astype(np.float32)
ValueError: setting an array element with a sequence.
I believe the error is occurring because I am appending a multi-dimensional array to batchImages & batchLabels
first iteration
specifically the following print statements shown on the first iteration displays
INDEXES GET ITEM 2
BATCH fileLabels SHAPE (42, 2)
BATCH fileImages SHAPE (42, 10, 128, 128, 2)
BATCH LABELS SHAPE (1, 42, 2)
BATCH IMAGES SHAPE (1, 42, 10, 128, 128, 2)
second iteration
then on the second iteration you can see that the shape of batchLabels and batchImages changes to (2,)
BATCH fileLabels SHAPE (8, 2)
BATCH fileImages SHAPE (8, 10, 128, 128, 2)
/root/NeuralNetwork_Research/keras_video_generator/src/keras_video/queue_train.py:178: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
print("BATCH LABELS SHAPE",np.array(batchLabels).shape)
BATCH LABELS SHAPE (2,)
BATCH IMAGES SHAPE (2,)
I am unsure of how to properly return this data for the tensorflow neural network model training
EDIT:
here I found a link describing I am having this error because I am trying to convert a tuple to a numpy array https://stackoverflow.com/a/47482672/7989522
I still dont understand how to move forward, because if I remove the lines
batchImages = np.asarray(batchImages).astype(np.float32) batchLabels = np.asarray(batchLabels).astype(np.float32)
Then I get the error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
CodePudding user response:
I just made batchsize = 1 and it works. That still is not an optimal solution because I literally cant add more samples to my batch, can only do 1 sample at a time so my GPU optimization suffers now.
followed this reference: "simply set the input sequence for the LSTM to (None, features) and use batch_size as 1."
Train and predict on variable length sequences
It does work however, so at least now I can move forward with creating the rest of the model.