Python Tensorflow Shape Mismatch (WaveNet)-CodePudding

I was trying to run a WaveNet, which is specified in https://github.com/mjpyeon/wavenet-classifier/blob/master/WaveNetClassifier.py.

Part of my code is as follows:

def residual_block(self, x, i):
    tanh_out = Conv1D(self.n_filters, self.kernel_size, dilation_rate=self.kernel_size ** i,
                      padding='causal', name='dilated_conv_%d_tanh' % (self.kernel_size ** i),
                      activation='tanh')(x)

    sigm_out = Conv1D(self.n_filters, self.kernel_size, dilation_rate=self.kernel_size ** i,
                      padding='causal', name='dilated_conv_%d_sigm' % (self.kernel_size ** i),
                      activation='sigmoid')(x)

    # 'z' multiplies the 2 Conv1D layer (one with tanh activation function & the other with
    # sigmoid activation function)
    z = Multiply(name='gated_activation_%d' % (i))([tanh_out, sigm_out])

    # Skip Layer includes 'z' going through Conv1D layer
    skip = Conv1D(self.n_filters, 1, name='skip_%d' % (i))(z)

    # Residual Layer adds the output from the skip layer & the original input
    res = Add(name='residual_block_%d' % (i))([skip, x])

    return res, skip

def train_dataset(self, X_train, y_train, validation_data=None, epochs=100):
    with tf.device('/GPU:0'):
        # 1. Input Layer
        x = Input(shape=self.input_shape, name='original_input')

        
        # 2. Creating a Skip Connection using specified no. of residual blocks
        skip_connections = []
        out = Conv1D(self.n_filters, 2, dilation_rate=1, padding='causal',
                     name='dilated_conv_1')(x)
        for i in range(1, self.dilation_depth   1):
            # The output from a residual block is fed back to the next residual block
            out, skip = self.residual_block(out, i)
            skip_connections.append(skip)

            
        # 3. ReLU Activation Function
        out = Add(name='skip_connections')(skip_connections)
        out = Activation('relu')(out)
        
        
        # 4. Series of Conv1D and AveragePooling1D Layer
        out = Conv1D(self.n_filters, 80, strides=1, padding='same', name='conv_5ms', 
                     activation='relu')(out)
        out = AveragePooling1D(80, padding='same', name='downsample_to_200Hz')(out)
        out = Conv1D(self.n_filters, 100, padding='same', activation='relu', 
                     name='conv_500ms')(out)
        out = Conv1D(self.output_shape[0], 100, padding='same', activation='relu', 
                     name='conv_500ms_target_shape')(out)
        out = AveragePooling1D(100, padding='same', name='downsample_to_2Hz')(out)
        out = Conv1D(self.output_shape[0], (int) (self.input_shape[0] / 8000), 
                     padding='same', name='final_conv')(out)
        out = AveragePooling1D((int) (self.input_shape[0] / 8000), name='final_pooling')(out)
        
        
        # 5. Reshaping into output dimension & Going through activation function
        out = Reshape(self.output_shape)(out)
        out = Activation('sigmoid')(out)
        print(out.shape)
        
        model = Model(x, out)
        model.summary()

        # Compiling the Model
        model.compile('adam', 'binary_crossentropy',
                           metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.7)])

        # Early Stopping
        callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)

        history = model.fit(X_train, y_train, shuffle=True, epochs=epochs, batch_size=32,
                                 validation_data=validation_data, callbacks=callback)

        return history

Here, self.input_shape=X_train.shape and self.output_shape=(11,)

It successfully printed out the model's summary, but was outputting the following error:

ValueError: Input 0 is incompatible with layer model_1: expected shape=(None, 19296, 110250), found shape=(32, 110250)

However, my X_train has a shape of (19296, 110250). I was trying to figure out on why the X_train has been reshaped from (19296, 110250) to (32, 110250), but couldn't find it out.

(19296 is the number of songs and 110250 is a 5 second length audio file with sampling rate of 22050 processed using Python Librosa library)

What is the problem of my code? Thank you in advance!

CodePudding user response：

Your data is a missing dimension. A Conv1D layer requires the input shape (timesteps, features). You seem to only have the timesteps or features. So maybe try something like this:

import tensorflow as tf

sample = 1
x_train = tf.random.normal((sample, 110250))
option1 = tf.expand_dims(x_train, axis=-1)
tf.print('expand_dims -->',option1.shape)

shape = tf.shape(x_train)
option2 = tf.reshape(x_train, (tf.shape(x_train)[0], 5, 22050)) 
tf.print('reshape -->',option2.shape)

expand_dims --> TensorShape([1, 110250, 1])
reshape --> TensorShape([1, 5, 22050])

Note that I only used one sample but I think you get the idea.