I have the same number of files, and still having different shape on them, ANN machine learning-CodePudding

I am trying to create neural network with python, it is kind of ANN network to use in classification problem. The purpose of the neural network is to classify who is speaking, whether it is me or someone else. I have the data in 2 folders. folders image one is called me, they are audios of me speaking, and another is called other, audios of other people speaking. View of the wav files(audio data)

The problem is that it cannot train the network because the data is not the same length, and if it does!, there are 18 in each folder, not one more, not one less.

When I do

print(X.shape)
print(y.shape)

gives this. Result of X, y shapes Is not the same shape even there are 18 audio files on each folder

model.py

    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    import tensorflow as tf
    import numpy as np
    from scipy.io import wavfile
    from pathlib import Path
    import os

    ### DATASET 
    pathlist = Path(os.path.abspath('Voiceclassification/Data/me/')).rglob('*.wav')

    # My voice data
    for path in pathlist:
        filename = str(path)

        # convert audio to numpy array and then 2D to 1D np Array
        samplerate, data = wavfile.read(filename)
        #print(f"sample rate: {samplerate}")
        data = data.flatten()
        #print(f"data: {data}")

    pathlist2 = Path(os.path.abspath('Voiceclassification/Data/other/')).rglob('*.wav')

    # other voice data
    for path2 in pathlist2:
        filename2 = str(path2)

        samplerate2, data2 = wavfile.read(filename2)
        data2 = data2.flatten()
        #print(data2)


    ### ADAPTING THE DATA FOR THE MODEL
    X = data # My voice
    y = data2 # Other data
    #print(X.shape)
    #print(y.shape)

    ### Trainig the model
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

    # Performing future scaling
    sc = StandardScaler()

    x_train = sc.fit_transform(x_train)
    x_test = sc.transform(x_test)

    ### Creating the ANN
    ann = tf.keras.models.Sequential()

    # First hidden layer of the ann
    ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
    # Second one
    ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
    # Output layer
    ann.add(tf.keras.layers.Dense(units=6, activation="sigmoid"))

    # Compile our neural network
    ann.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=['accuracy'])

    # Fit ANN
    ann.fit(x_train, y_train, batch_size=32, epochs=100)
    ann.save('train_model.model')

Any idea?

CodePudding user response：

Is because your wav audio files maybe have different sizes, they can be 10 seconds all, but if millisecond are different, that will affect your data shape, what you can do is trim your wav files so all of them are 10.00sec with no millisenconds