Home > Mobile >  ML model fit with training data outperforms model fit with generator
ML model fit with training data outperforms model fit with generator

Time:01-23

My ultimate goal is to fit a ML autoencoder by inputting a data generator into the fit method within the keras API. However, I am finding that models fit with a generator are inferior to models fit on the raw data themselves.

To demonstrate this, I've taken the following steps:

  1. define a data generator to create a set variably damped sine waves. Importantly, I define the batch size of the generator to equal the entire training dataset. In this way, I can eleminate the batch size as a possible reason for poor performance of the model fit with the generator.
  2. define a very simple ML autoencoder. Note the latent space of the autoencoder is larger than the size of the original data so it should learn how to reproduce the original signal relatively quickly.
  3. Train one model using the generator
  4. Create a set of training data with the generator's __getitem__ method and use those data to fit the same ML model.

Results from the model trained on the generator are far inferior to those trained on the data themselves.

My formulation of the generator must be wrong but, for the life of me, I can not find my mistake. For reference, I emulated generators discussed here and here.

Update:

I simplified the problem such that the generator, instead of producing a series of randomly parameterized damped sine waves, now produces a vector of ones (i.e., np.ones(batch_size, 1000, 1). I fit my autoencoder model and, as before, the model fit with the generator still under performs relative to the model fit on the raw data themselves.

Side note: I edited the originally posted code to reflect this update.

import numpy as np
import matplotlib.pyplot as plt
import keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv1D, Conv1DTranspose, MaxPool1D
import tensorflow as tf


""" Generate training/testing data data (i.e., a vector of ones) """


class DataGenerator(keras.utils.Sequence):
    def __init__(
        self,
        batch_size,
        vector_length,
    ):
        self.batch_size = batch_size
        self.vector_length = vector_length

    def __getitem__(self, index):
        x = np.ones((self.batch_size, self.vector_length, 1))
        y = np.ones((self.batch_size, self.vector_length, 1))
        return x, y

    def __len__(self):
        return 1 #one batch of data


vector_length = 1000
train_gen = DataGenerator(800, vector_length)
test_gen = DataGenerator(200, vector_length)


""" Machine Learning Model and Training """

# Class to hold ML model
class MLModel:
    def __init__(self, n_inputs):
        self.n_inputs = n_inputs

        visible = Input(shape=n_inputs)
        encoder = Conv1D(
            filters=1,
            kernel_size=100,
            padding="same",
            strides=1,
            activation="LeakyReLU",
        )(visible)
        encoder = MaxPool1D(pool_size=2)(encoder)

        # decoder
        decoder = Conv1DTranspose(
            filters=1,
            kernel_size=100,
            padding="same",
            strides=2,
            activation="linear",
        )(encoder)

        model = Model(inputs=visible, outputs=decoder)
        model.compile(optimizer="adam", loss="mse")
        self.model = model


""" EXPERIMENT 1 """

# instantiate a model
n_inputs = (vector_length, 1)
model1 = MLModel(n_inputs).model

# train first model!
model1.fit(x=train_gen, epochs=10, validation_data=test_gen)

""" EXPERIMENT 2 """

# use the generator to create training and testing data
train_x, train_y = train_gen.__getitem__(0)
test_x, test_y = test_gen.__getitem__(0)

# instantiate a new model
model2 = MLModel(n_inputs).model

# train second model!
history = model2.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=10)


""" Model evaluation and plotting """

pred_y1 = model1.predict(test_x)
pred_y2 = model2.predict(test_x)

plt.ion()
plt.clf()
n = 0
plt.plot(test_y[n, :, 0], label="True Signal")
plt.plot(pred_y1[n, :, 0], label="Model1 Prediction")
plt.plot(pred_y2[n, :, 0], label="Model2 Prediction")
plt.legend()

CodePudding user response:

I made a rookie mistake and forgot that model.fit defaults to batch_size = 32. Therefore the experiments posted above are not making an "apples-to-apples" comparison because the model fit with the generator used a batch_size=800 while the model fit on the data themselves used a batch_size=32. When setting the same batch size for both experiments, both models perform similarly.

p.s. In case it is helpful to anyone: I didn't realize how important batch size is as a hyperparameter. There are of course caveats, nuances, and exceptions but apparently a smaller batch size helps generalize a model. I won't belabor the subject but there are interesting reads here, here, and here

  • Related