suppose I have two numpy arrays of different shapes.
np array 1 shape (300, 15111)
np array 2 shape ( 50, 10465)
I want to pad np array 2 so that it matches 15111. I want to do this because latter on I want to concatenate these two arrays. So, my final array would be of shape (350, 15111), i.e., containing the 300 instances from np array 1 50 instances from np array 2 with the same number of "columns".
I am trying to do the following:
raw_inputs = [np array 1, np array 2]
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs,
padding="post")
print(padded_inputs)
But I am in the wrong direction because I am getting the error below:
ValueError Traceback (most recent call last)
<ipython-input-108-6d06055c9929> in <module>
2
3 padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs,
----> 4 padding="post")
5
6 print(padded_inputs)
1 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/sequence.py in
pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
100 raise ValueError('Shape of sample %s of sequence at position %s '
101 'is different from expected shape %s' %
--> 102 (trunc.shape[1:], idx, sample_shape))
103
104 if padding == 'post':
ValueError: Shape of sample (10465,) of sequence at position 1 is different from
expected shape (15111,)
In addition, I don't know how to concatenate these two np arrays once they have the same size.
Any help I would really appreaciate!
CodePudding user response:
Maybe try something like this:
import tensorflow as tf
array1 = tf.random.normal((300, 15111))
array2 = tf.random.normal((50, 10465))
difference = tf.shape(array1)[-1] - tf.shape(array2)[-1]
# post padding
array2 = tf.concat([array2, tf.zeros((50, difference))], axis=-1)
final_array = tf.concat([array1, array2], axis=0)
final_array.shape
# TensorShape([350, 15111])
The logic is exactly the same with numpy
:
import numpy as np
array1 = np.random.random((300, 15111))
array2 = np.random.random((50, 10465))
difference = array1.shape[-1] - array2.shape[-1]
array2 = np.concatenate([array2, np.zeros((50, difference))], axis=-1)
final_array = np.concatenate([array1, array2], axis=0)
final_array.shape
Or with tf.keras.preprocessing.sequence.pad_sequences
:
import tensorflow as tf
array1 = tf.random.normal((300, 15111))
array2 = tf.random.normal((50, 10465))
array2 = tf.keras.preprocessing.sequence.pad_sequences(array2, maxlen=tf.shape(array1)[-1])
final_array = tf.concat([array1, array2], axis=0)
final_array.shape
CodePudding user response:
You could use numpy.concatenate
instead.
import numpy as np
small_array = np.ones((50, 10465))
big_array = np.ones((300, 15111))
# create the padding to concatenate
pad = np.zeros((small_array.shape[0], big_array.shape[1]-small_array.shape[1]))
new_array = np.concatenate((small_array, pad), axis=1)
print(new_array.shape) # prints (50, 15111)
complete_array = np.concatenate((big_array, new_array), axis=0)
print(complete_array.shape) # prints (350, 15111)
If you add a pad using numpy.pad
, the padding specified at the parameter pad_width
is applied at the edges of each axis. But you only want to pad the axis=1.
To clarify:
import numpy as np
array = np.ones((2, 2))
new_array = np.pad(array, 2)
print(new_array.shape) # produces (6, 6) which is not what you want.