AssertionError: Signal dimention should be of the format of (N,) but it is (743424, 2) instead-CodePudding

For my ML project, I'm using a Model to which I give a video and audio as input file to detect the synthetic voice in the video.

But it returns an error on the audio_processing() function:

Code for audio_processing()

def audio_processing(wav_file, verbose=True):

    rate, sig = wav.read(wav_file)
    if verbose:
        print("Sig length: {}, sample_rate: {}".format(len(sig), rate))

    try:
        mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
    except IndexError:
        raise ValueError("ERROR: Index error occurred while extracting mfcc")

    if verbose:
        print("mfcc_features shape:", mfcc_features.shape)

    # Number of audio clips = len(mfcc_features) // length of each audio clip
    number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS

    if verbose:
        print("Number of audio clips:", number_of_audio_clips)

    # Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
    # Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
    mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]

    # Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
    mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)

    if verbose:
        print("Final mfcc_features shape:", mfcc_features.shape)
    return mfcc_features

Error:

AssertionError: Signal dimention should be of the format of (N,) but it is (691200, 2) instead

File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\VU Final Project\4 - Final Deliverable\Synthetic-Speech-Detection-in-Video\app.py", line 673, in modelprediction
audio_fea = audio_processing(audio, False)
File "D:\VU Final Project\4 - Final Deliverable\Synthetic-Speech-Detection-in-Video\app.py", line 49, in audio_processing
mfcc_features = speechpy.feature.mfcc(
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\speechpy\feature.py", line 139, in mfcc
feature, energy = mfe(signal, sampling_frequency=sampling_frequency,
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\speechpy\feature.py", line 185, in mfe
frames = processing.stack_frames(
File "C:\Users\DELL\AppData\Roaming\Python\Python39\site-packages\speechpy\processing.py", line 90, in stack_frames
assert sig.ndim == 1, s % str(sig.shape)

CodePudding user response：

From the looks of it, your audio file contains two channels, which you can check by looking at the shape of the array that the wav.read function returns: sig.shape.

The speechpy.feature.mfcc function expects a single-channel audio. I believe what you can do is to convert your audio to a single channel, for example by averaging the two channels:

sig = np.mean(sig, axis=1)

If you want your function to work on both single-channel and multi-channel data, you can just compute the mean only if the signal of your audio is multi-channel:

if sig.ndim == 2:
    sig = np.mean(sig, axis=1)