Tensorflow can't append batches together after doing the first epoch-CodePudding

I am running into problems with my code after I removed the loss function of the compile step (set it equal to loss=None) and added one with the intention of adding another, loss function through the add_loss method. I can call fit and it trains for one epoch but then I get this error:

ValueError: operands could not be broadcast together with shapes (128,) (117,) (128,)

My batch size is 128. It looks like 117 is somehow dependent on the number of examples that I am using. When I vary the number of examples, I get different numbers from 117. They are all my number of examples mod my batch size. I am at a loss about how to fix this issue. I am using tf.data.TFRecordDataset as input.

I have the following simplified model:

class MyModel(Model):

  def __init__(self):
    super(MyModel, self).__init__()

    encoder_input = layers.Input(shape=INPUT_SHAPE, name='encoder_input')
    x = encoder_input
    x = layers.Conv2D(64, (3, 3), activation='relu', padding='same', strides=2)(x)
    x = layers.BatchNormalization()(x)
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same', strides=2)(x)
    x = layers.BatchNormalization()(x)
    x = layers.Flatten()(x)

    encoded = layers.Dense(LATENT_DIM, name='encoded')(x)

    self.encoder = Model(encoder_input, outputs=[encoded])

    self.decoder = tf.keras.Sequential([
      layers.Input(shape=LATENT_DIM),
      layers.Dense(32 * 32 * 32),
      layers.Reshape((32, 32, 32)),
      layers.Conv2DTranspose(32, kernel_size=3, strides=2, activation='relu', padding='same'),
      layers.Conv2DTranspose(64, kernel_size=3, strides=2, activation='relu', padding='same'),
      layers.Conv2D(3, kernel_size=(3, 3), activation='sigmoid', padding='same')])

  def call(self, x):
    encoded = self.encoder(x)

    decoded = self.decoder(encoded)

    # Loss function. Has to be here because I intend to add another, more layer-interdependent, loss function.
    r_loss = tf.math.reduce_sum(tf.math.square(x - decoded), axis=[1, 2, 3])
    self.add_loss(r_loss)

    return decoded


def read_tfrecord(example):
  example = tf.io.parse_single_example(example, CELEB_A_FORMAT)
  image = decode_image(example['image'])

  return image, image

def load_dataset(filenames, func):
  dataset = tf.data.TFRecordDataset(
    filenames
  )

  dataset = dataset.map(partial(func), num_parallel_calls=tf.data.AUTOTUNE)

  return dataset

def train_autoencoder():
  filenames_train = glob.glob(TRAIN_PATH)
  train_dataset_x_x = load_dataset(filenames_train[:4], func=read_tfrecord)

  autoencoder = Autoencoder()

  # The loss function used to be defined here and everything worked fine before.
  def r_loss(y_true, y_pred):
    return tf.math.reduce_sum(tf.math.square(y_true - y_pred), axis=[1, 2, 3])

  optimizer = tf.keras.optimizers.Adam(1e-4)

  autoencoder.compile(optimizer=optimizer, loss=None)

  autoencoder.fit(train_dataset_x_x.batch(AUTOENCODER_BATCH_SIZE),
                  epochs=AUTOENCODER_NUM_EPOCHS,
                  shuffle=True)

CodePudding user response：

If you only want to get rid of the error and don't care about the last "remainder" batch of your dataset, you can use the keyword argument drop_remainder=True inside of train_dataset_x_x.batch(), that way all of your batches will be the same size.

FYI, it's usually better practice to batch your dataset outside of the function call for fit:

data = data.batch(32)
model.fit(data)

CodePudding user response：

the loss function can not be set in the call method . the call method is intended to make a forward pass not to copute the loss .

u need to add the loss function in the compile method or after it