I'm currently trying to create a WGAN implementation with gradient penalty in keras following the setup here: https://keras.io/examples/generative/wgan_gp/. However, I have modified this to generate time series using RNNs.
The time series in the training set are of variable lengths, so I am training the model using train_on_batch
with one time series at a time. I have modified the train_step
function in the code linked above to handle this, with the new function given by
def train_step(self, X):
if isinstance(X, tuple):
X = X[0]
batch_size = X.get_shape()[0]
timesteps = X.get_shape()[1]
for i in range(self.d_steps):
# Get the latent vector
noise = tf.random.normal((batch_size, self.latent_dims))
noise = tf.reshape(noise, (batch_size, 1, self.latent_dims))
noise = tf.repeat(noise, timesteps, 1)
with tf.GradientTape() as tape:
fake_images = self.generator(noise, training=True)
fake_logits = self.discriminator(fake_images, training=True)
real_logits = self.discriminator(X, training=True)
d_cost = self.d_loss_fn(real_img=real_logits, fake_img=fake_logits)
gp = self.gradient_penalty(batch_size, X, fake_images)
d_loss = d_cost gp * self.gp_weight
d_gradient = tape.gradient(d_loss, self.discriminator.trainable_variables)
self.d_optimizer.apply_gradients(
zip(d_gradient, self.discriminator.trainable_variables)
)
noise = tf.random.normal((batch_size, self.latent_dims))
noise = tf.reshape(noise, (batch_size, 1, self.latent_dims))
noise = tf.repeat(noise, timesteps, 1)
with tf.GradientTape() as tape:
generated_data = self.generator(noise, training=True)
gen_img_logits = self.discriminator(generated_data, training=True)
g_loss = self.g_loss_fn(gen_img_logits)
gen_gradient = tape.gradient(g_loss, self.generator.trainable_variables)
self.g_optimizer.apply_gradients(
zip(gen_gradient, self.generator.trainable_variables)
)
return {"d_loss": d_loss, "g_loss": g_loss}
and run this using
for epoch in epochs:
names = train_df.names.unique()
for batch in nbatches:
name = names[batch]
X = train_df[train_df.name == name].values
X = X[:, 1:] # removes name column
X = X.reshape((1, *X.shape))
wgan.train_on_batch(X)
Here, train_df
is just a pandas dataframe filled with 12 columns containing values between 0 and 1 (these contain the observations in the time series) and a 13th column which just contains the name of each time series to separate out the data (this is the first column).
The idea of this is that for each time series, the first part of train_step
will generate noise with the same number of timesteps as the time series which ensures that the generated data is the same shape as the real data.
The number of timesteps is supposed to be given by X.get_shape()[1]
. For the first iteration, X is a numpy array of shape (1, 18, 12) and when passed to train_step
the size of the tensor X is also (1, 18, 12) which means that the variable timesteps
is set to 18 by timesteps = X.get_shape()[1]
as expected. For this second iteration, X is a numpy array of shape (1, 15, 12) and this also works as expected. However, on the third iteration X is a numpy array of shape (1, 13, 12) and when passed to train_step
the shape is now (1, None, 12) which means that timesteps is set to None and the code then doesn't work.
I'm very confused why X.get_shape()
works correctly at the start but not after the third iteration and can't find a fix. Basically I just need to set timesteps to the correct value, I was also thinking of maybe passing in the value as a separate variable to ensure this value is correct rather than relying on get_shape
but can't think of a way to do that. Can anyone suggest why get_shape
starts returning None after 2 iterations and how to avoid it? If you've got this far, thanks very much for reading and apologies for the length!
CodePudding user response:
Try changing:
timesteps = X.get_shape()[1]
To:
timesteps = tf.shape(X)[1]
to get the dynamic shape of X
during training.