I am working on a project of classification of emotions using audio and text. I passed the audio and text to 1D CNN and got the following output arrays:
audio_features_shape = (396, 63, 64)
text_features_shape = (52, 1, 64)
Now I want to stack these two different dimensions arrays into one so I can pass one array to LSTM. I want the shape as:
expected_array_shape = (448, 64, 128)
I tried the following methods but no one is giving the output I want.
x = np.column_stack((audio_features, text_features))
x = np.concatenate((audio_features,text_features), axis=2)
x = np.append(audio_features, text_features)
x = np.transpose([np.tile(audio_features, len(text_features)), np.repeat(text_features, len(audio_features))])
x = np.array([np.append(text_features,x) for x in audio_features])
Any help would be appreciated. Thanks!
CodePudding user response:
How are the values of the 2 arrays supposed to be distributed in the result?
audio_features_shape = (396, 63, 64)
text_features_shape = (52, 1, 64)
text_features
should be "expanded" to (52,63,64), either by repeating values 63 times on the middle axis, or putting this array into a target array of 0s. In either case it will be 63 times larger.
Once the arrays match on all but the first dimension they can be concatenated.
But the real question is, what makes sense in the LSTM use?
CodePudding user response:
Depending on what exactly you want and whether you are only interested in using Tensorflow, you could give the following a try:
import tensorflow as tf
audio_features = tf.random.normal((396, 63, 64))
text_features = tf.random.normal((52, 1, 64))
text_features = tf.repeat(text_features, repeats=(audio_features.shape[1]-text_features.shape[1]) 1, axis=1)
repeat_features = tf.concat([audio_features, text_features], axis=0)
text_features = tf.random.normal((52, 1, 64))
paddings = tf.constant([[0, 0], [0, audio_features.shape[1]-text_features.shape[1]], [0, 0]])
pad_features = tf.concat([audio_features, tf.pad(text_features, paddings, "CONSTANT")], axis=0)
print('Using tf.repeat --> ', audio_features.shape, text_features.shape, repeat_features.shape)
print('Using tf.pad --> ', audio_features.shape, text_features.shape, pad_features.shape)
Using tf.repeat --> (396, 63, 64) (52, 1, 64) (448, 63, 64)
Using tf.pad --> (396, 63, 64) (52, 1, 64) (448, 63, 64)