Home > Blockchain >  AssertionError: Signal dimention should be of the format of (N,) but it is (743424, 2) instead
AssertionError: Signal dimention should be of the format of (N,) but it is (743424, 2) instead

Time:11-22

For my ML project, I'm using a Model to which I give a video and audio as input file to detect the synthetic voice in the video.

But it returns an error on the audio_processing() function:

Code for audio_processing()

def audio_processing(wav_file, verbose=True):

    rate, sig = wav.read(wav_file)
    if verbose:
        print("Sig length: {}, sample_rate: {}".format(len(sig), rate))

    try:
        mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
    except IndexError:
        raise ValueError("ERROR: Index error occurred while extracting mfcc")

    if verbose:
        print("mfcc_features shape:", mfcc_features.shape)

    # Number of audio clips = len(mfcc_features) // length of each audio clip
    number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS

    if verbose:
        print("Number of audio clips:", number_of_audio_clips)

    # Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
    # Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
    mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]

    # Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
    mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)

    if verbose:
        print("Final mfcc_features shape:", mfcc_features.shape)
    return mfcc_features

Error:

AssertionError: Signal dimention should be of the format of (N,) but it is (691200, 2) instead
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\VU Final Project\4 - Final Deliverable\Synthetic-Speech-Detection-in-Video\app.py", line 673, in modelprediction
audio_fea = audio_processing(audio, False)
File "D:\VU Final Project\4 - Final Deliverable\Synthetic-Speech-Detection-in-Video\app.py", line 49, in audio_processing
mfcc_features = speechpy.feature.mfcc(
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\speechpy\feature.py", line 139, in mfcc
feature, energy = mfe(signal, sampling_frequency=sampling_frequency,
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\speechpy\feature.py", line 185, in mfe
frames = processing.stack_frames(
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\speechpy\processing.py", line 90, in stack_frames
assert sig.ndim == 1, s % str(sig.shape)

CodePudding user response:

From the looks of it, your audio file contains two channels, which you can check by looking at the shape of the array that the wav.read function returns: sig.shape.

The speechpy.feature.mfcc function expects a single-channel audio. I believe what you can do is to convert your audio to a single channel, for example by averaging the two channels:

sig = np.mean(sig, axis=1)

If you want your function to work on both single-channel and multi-channel data, you can just compute the mean only if the signal of your audio is multi-channel:

if sig.ndim == 2:
    sig = np.mean(sig, axis=1)
  • Related