How does tensorflow handle training data passed to a neural network?-CodePudding

I am having an issue with my code that I modified from https://keras.io/examples/generative/wgan_gp/ . Instead of the data being images, my data is a (1001,2) array of sequential data. The first column being the time and the second the velocity measurements. I'm getting this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14704/3651127346.py in <module>
     21 # Training the WGAN-GP model
     22 tic = time.perf_counter()
---> 23 WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
     24 toc = time.perf_counter()
     25 time_elapsed(toc-tic)

~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

~\Anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in autograph_handler(*args, **kwargs)
   1145           except Exception as e:  # pylint:disable=broad-except
   1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
   1148             else:
   1149               raise

ValueError: in user code:

    File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 141, in train_step
        gp = self.gradient_penalty(batch_size, x_real, x_fake)
    File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 106, in gradient_penalty
        alpha = tf.random.uniform(batch_size,1,1)

    ValueError: Shape must be rank 1 but is rank 0 for '{{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0](strided_slice)' with input shapes: [].

And here is my code:

import time
from tqdm.notebook import tqdm
import os

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input

import numpy as np
import matplotlib.pyplot as plt

def define_generator(latent_dim):
    # This function creates the generator model using the functional API.
    
    # Layers...
    # Input Layer
    inputs = Input(shape=latent_dim, name='INPUT_LAYER')
    # 1st hidden layer
    x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
    # 2nd hidden layer
    x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
    # 3rd hidden layer
    x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
    # 4th hidden layer
    x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
    # 5th hidden layer
    x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
    # Output layer
    outputs = Dense(2, activation='linear', name='OUPUT_LAYER')(x)
    # Instantiating the generator model
    model = Model(inputs=inputs, outputs=outputs, name='GENERATOR')
    
    return model


def generator_loss(fake_logits):
    # This function calculates and returns the WGAN-GP generator loss.
    
    # Expected value of critic ouput from fake images
    expectation_fake = tf.reduce_mean(fake_logits)
    # Loss to minimize
    loss = -expectation_fake
    
    return loss


def define_critic():
    # This function creates the critic model using the functional API.
    
    # Layers...
    # Input Layer
    inputs = Input(shape=2, name='INPUT_LAYER')
    # 1st hidden layer
    x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
    # 2nd hidden layer
    x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
    # 3rd hidden layer
    x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
    # 4th hidden layer
    x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
    # 5th hidden layer
    x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
    # Output layer
    outputs = Dense(1, activation='linear', name='OUPUT_LAYER')(x)
    # Instantiating the critic model
    model = Model(inputs=inputs, outputs=outputs, name='CRITIC')
    
    return model


def critic_loss(real_logits, fake_logits):
    # This function calculates and returns the WGAN-GP critic loss.
    
    # Expected value of critic output from real images
    expectation_real = tf.reduce_mean(real_logits)
    # Expected value of critic output from fake images
    expectation_fake = tf.reduce_mean(fake_logits)
    # Loss to minimize
    loss = expectation_fake - expectation_real
    
    return loss


class define_wgan(keras.Model):
    # This class creates the WGAN-GP object.
    # Attributes:
    #     critic = the critic model.
    #     generator = the generator model.
    #     latent_dim = defines generator input dimension.
    #     critic_steps = defines how many times the discriminator gets trained for each training cycle.
    #     gp_weight = defines and returns the critic gradient for the gradient penalty term.
    # Methods:
    #     compile() = defines the optimizer and loss function of both the critic and generator.
    #     gradient_penalty() = calcuates and returns the gradient penalty term in the WGAN-GP loss function.
    #     train_step() = performs the WGAN-GP training by updating the critic and generator weights
    #         and returns the loss for both. Called by fit().
     
    def __init__(self, gen, critic, latent_dim, n_critic_train, gp_weight):
        super().__init__()
        self.critic = critic
        self.generator = gen
        self.latent_dim = latent_dim
        self.critic_steps = n_critic_train
        self.gp_weight = gp_weight
    
    def compile(self, generator_loss, critic_loss):
        super().compile()
        self.generator_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
        self.critic_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
        self.generator_loss_function = generator_loss
        self.critic_loss_function = critic_loss
        
    def gradient_penalty(self, batch_size, x_real, x_fake):
        
        # Random uniform samples of points between distribution.
        # "alpha" must be a tensor so that "x_interp" will also be a tensor.
        alpha = tf.random.uniform(batch_size,1,1)
        # Data interpolated between real and fake distributions
        x_interp = alpha*x_real   (1-alpha)*x_fake
        # Calculating critic output gradient wrt interpolated data
        with tf.GradientTape() as gp_tape:
            gp_tape.watch(x_interp)
            critc_output = self.discriminator(x_interp, training=True)
        grad = gp_tape.gradient(critic_output, x_interp)[0]
        # Calculating norm of gradient
        grad_norm = tf.sqrt(tf.reduce_sum(tf.square(grad)))
        # calculating gradient penalty
        gp = tf.reduce_mean((norm - 1.0)**2)
        
        return gp

    def train_step(self, x_real):
        # Critic training
        # Getting batch size for creating latent vectors
        print(x_real)
        batch_size = tf.shape(x_real)[0]
        print(batch_size)
        # Critic training loop
        for i in range(self.critic_steps):
            # Generating latent vectors
            latent = tf.random.normal(shape=(batch_size, self.latent_dim))
            with tf.GradientTape() as tape:
                # Obtaining fake data from generator
                x_fake = self.generator(latent, training=True)
                # Critic output from fake data
                fake_logits = self.critic(x_fake, training=True)
                # Critic output from real data
                real_logits = self.critic(x_real, training=True)
                # Calculating critic loss
                c_loss = self.critic_loss_function(real_logits, fake_logits)
                # Calcuating gradient penalty
                gp = self.gradient_penalty(batch_size, x_real, x_fake)
                # Adjusting critic loss with gradient penalty
                c_loss = c_loss   gp_weight*gp
            # Calculating gradient of critic loss wrt critic weights
            critic_grad = tape.gradient(c_loss, self.critic.trainable_variables)
            # Updating critic weights
            self.critic_optimizer.apply_gradients(zip(critic_gradient, self.critic.trainable_variables))
        # Generator training
        # Generating latent vectors
        latent = tf.random.normal(shape=(batch_size, self.latent_dim))
        with tf.GradientTape() as tape:
            # Obtaining fake data from generator
            x_fake = self.generator(latent, training=True)
            # Critic output from fake data
            fake_logits = self.critic(x_fake, training=True)
            # Calculating generator loss
            g_loss = self.generator_loss_function(fake_logits)
        # Calculating gradient of generator loss wrt generator weights
        genertor_grad = tape.gradient(g_loss, self.generator.trainable_variables)
        # Updating generator weights
        self.generator_optimizer.apply_gradients(zip(generator_gradient, self.generator.trainable_variables))
        
        return g_loss, c_loss

class GAN_monitor(keras.callbacks.Callback):
    def __init__(self, n_samples, latent_dim):
        self.n_samples = n_samples
        self.latent_dim = latent_dim

    def on_epoch_end(self, epoch, logs=None):
        latent = tf.random.normal(shape=(self.n_samples, self.latent_dim))
        generated_data = self.model.generator(latent)
        plt.plot(generated_data)
        plt.savefig('Epoch _' str(epoch) '.png', dpi=300)

data = np.genfromtxt('Flight_1.dat', dtype='float', encoding=None, delimiter=',')[0:1001,0]
time_span = np.linspace(0,20,1001)
dataset = np.concatenate((time_sapn[:,np.newaxis], data[:,np.newaxis]), axis=1)
dataset.shape

# Training Parameters
latent_dim = 100
n_epochs = 10
n_critic_train = 5
gp_weight = 10
batch_Size = 100

# Instantiating the generator and discriminator models
gen = define_generator(latent_dim)
critic = define_critic()

# Instantiating the WGAN-GP object
WGAN = define_wgan(gen, critic, latent_dim, n_critic_train, gp_weight)

# Compling the WGAN-GP model
WGAN.compile(generator_loss, critic_loss)

# Instantiating custom Keras callback
cbk = GAN_monitor(n_samples=1, latent_dim=latent_dim)

# Training the WGAN-GP model
tic = time.perf_counter()
WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
toc = time.perf_counter()
time_elapsed(toc-tic)

This issue is the shape I am providing to tf.random.rand() for the assignment of alpha. I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example. So I don't know how to specify the shape for my example. Furthermore I don't understand this line in the Keras example:

batch_size = tf.shape(real_images)[0]

In this example 'real_images' is a (60000, 28, 28, 1) array and it gets passed to the fit() method which then passes it to the train_step() method. (It gets passed as "train_images", but they are the same variable.) If I add a line that prints out 'real_images' before this tf.shape() this is what it produces:

Tensor("IteratorGetNext:0", shape=(None, 28, 28, 1), dtype=float32)

Why is the 60000 now None? Then, I added a line that printed out "batch_size" after the tf.shape() and this is what it produces:

Tensor("strided_slice:0", shape=(), dtype=int32)

I googled "tf strided_slice", but all I could find is the method tf.strided_slice(). So what exactly is the value of "batch_size" and why are the output of variables so ambiguous when they are tensors? In fact, I type:

tf.shape(train_images)[0]

in another cell of Jupyter notebook. I get a completely different output:

<tf.Tensor: shape=(), dtype=int32, numpy=60000>

I really need to understand this Keras example in order to successfully implement this code for my data. Any help is appreciated.

BTW: I am using only one set of data for now, but once I get the GAN running, I will provide multiple sets of these (1001,2) datasets. Also, if you want to test the code yourself, replacing the "dataset" variable with any (1001,2) numpy array should suffice. Thank You.

CodePudding user response：

'Why is the 60000 now None?': In defining TensorFlow models, the first dimension (batch_size) is None. Getting under the hood of what goes on with TensorFlow and how it uses graphs for computation can be quite complex. But for your understanding right now, all you need to know is that batch_size does not need to be specified when defining the model, hence None. This is essential as it allow a model to be defined once but then trained with and applied to datasets of an arbitrary number of examples. For example, when training you may provide the model with a batch of 256 images at a time, but when using the trained model for inference, it's very likely that you might only want the input to be a single image. Therefore the actual value of the first dimension of the size of the input is only important once the computation is going to begin.

'I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example': The reason for this size is that you want a different random value, alpha, for each image. You have batch_size number of images, hence batch_size in the first dimension, but it is just a single value in tensor format, so it only need size 1 in all other dimensions. The reason it has 4 dimensions overall is so that it can be used in calculation with your inputs, which are 4-D image tensors which will have a shape of something like (batch_size, img_h, img_w, 3) for color images with 3 RGB channels.

In terms of understanding your error, Shape must be rank 1 but is rank 0, this is saying that the function you are using - tf.random.uniform requires a rank 1 tensor, i.e. something with 1 dimension, but is being passed a rank 0 tensor, i.e. a scalar value. It is possible from your code that you are just passing it the value of batch_size rather than a tensor. This might work instead:

alpha = tf.random.uniform([batch_size, 1, 1, 1])

The first parameter of this function is its shape and so it is important to have the [] there. Check out the documentation on this function in order to make sure you're using it correctly - https://www.tensorflow.org/api_docs/python/tf/random/uniform.