I am trying to create neural network with python, it is kind of ANN network to use in classification problem. The purpose of the neural network is to classify who is speaking, whether it is me or someone else. I have the data in 2 folders. folders image one is called me, they are audios of me speaking, and another is called other, audios of other people speaking. View of the wav files(audio data)
The problem is that it cannot train the network because the data is not the same length, and if it does!, there are 18 in each folder, not one more, not one less.
When I do
print(X.shape)
print(y.shape)
gives this. Result of X, y shapes Is not the same shape even there are 18 audio files on each folder
model.py
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
import numpy as np
from scipy.io import wavfile
from pathlib import Path
import os
### DATASET
pathlist = Path(os.path.abspath('Voiceclassification/Data/me/')).rglob('*.wav')
# My voice data
for path in pathlist:
filename = str(path)
# convert audio to numpy array and then 2D to 1D np Array
samplerate, data = wavfile.read(filename)
#print(f"sample rate: {samplerate}")
data = data.flatten()
#print(f"data: {data}")
pathlist2 = Path(os.path.abspath('Voiceclassification/Data/other/')).rglob('*.wav')
# other voice data
for path2 in pathlist2:
filename2 = str(path2)
samplerate2, data2 = wavfile.read(filename2)
data2 = data2.flatten()
#print(data2)
### ADAPTING THE DATA FOR THE MODEL
X = data # My voice
y = data2 # Other data
#print(X.shape)
#print(y.shape)
### Trainig the model
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
# Performing future scaling
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
### Creating the ANN
ann = tf.keras.models.Sequential()
# First hidden layer of the ann
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
# Second one
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
# Output layer
ann.add(tf.keras.layers.Dense(units=6, activation="sigmoid"))
# Compile our neural network
ann.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=['accuracy'])
# Fit ANN
ann.fit(x_train, y_train, batch_size=32, epochs=100)
ann.save('train_model.model')
Any idea?
CodePudding user response:
Is because your wav audio files maybe have different sizes, they can be 10 seconds all, but if millisecond are different, that will affect your data shape, what you can do is trim your wav files so all of them are 10.00sec with no millisenconds