padding one numpy array to achieve the same number os columns of another numpy array-CodePudding

suppose I have two numpy arrays of different shapes.

np array 1 shape (300, 15111)

np array 2 shape ( 50, 10465)

I want to pad np array 2 so that it matches 15111. I want to do this because latter on I want to concatenate these two arrays. So, my final array would be of shape (350, 15111), i.e., containing the 300 instances from np array 1 50 instances from np array 2 with the same number of "columns".

I am trying to do the following:

raw_inputs = [np array 1, np array 2]

padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs, 
padding="post")

print(padded_inputs)

But I am in the wrong direction because I am getting the error below:

ValueError                                Traceback (most recent call last)
<ipython-input-108-6d06055c9929> in <module>
      2 
      3 padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs, 
----> 4 padding="post")
      5 
      6 print(padded_inputs)

1 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/sequence.py in 
pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
    100             raise ValueError('Shape of sample %s of sequence at position %s '
    101                              'is different from expected shape %s' %
--> 102                              (trunc.shape[1:], idx, sample_shape))
    103 
    104         if padding == 'post':

ValueError: Shape of sample (10465,) of sequence at position 1 is different from 
expected shape (15111,)

In addition, I don't know how to concatenate these two np arrays once they have the same size.

Any help I would really appreaciate!

CodePudding user response：

Maybe try something like this:

import tensorflow as tf

array1 = tf.random.normal((300, 15111))
array2 = tf.random.normal((50, 10465))

difference = tf.shape(array1)[-1] - tf.shape(array2)[-1]

# post padding
array2 = tf.concat([array2, tf.zeros((50, difference))], axis=-1) 

final_array = tf.concat([array1, array2], axis=0)
final_array.shape
# TensorShape([350, 15111])

The logic is exactly the same with numpy:

import numpy as np

array1 = np.random.random((300, 15111))
array2 = np.random.random((50, 10465))

difference = array1.shape[-1] - array2.shape[-1]

array2 = np.concatenate([array2, np.zeros((50, difference))], axis=-1)

final_array = np.concatenate([array1, array2], axis=0)
final_array.shape

Or with tf.keras.preprocessing.sequence.pad_sequences:

import tensorflow as tf

array1 = tf.random.normal((300, 15111))
array2 = tf.random.normal((50, 10465))

array2 = tf.keras.preprocessing.sequence.pad_sequences(array2, maxlen=tf.shape(array1)[-1])
final_array = tf.concat([array1, array2], axis=0)
final_array.shape

CodePudding user response：

You could use numpy.concatenate instead.

import numpy as np
small_array = np.ones((50, 10465))
big_array = np.ones((300, 15111))

# create the padding to concatenate
pad = np.zeros((small_array.shape[0], big_array.shape[1]-small_array.shape[1]))

new_array =  np.concatenate((small_array, pad), axis=1)
print(new_array.shape)  # prints (50, 15111)

complete_array = np.concatenate((big_array, new_array), axis=0)
print(complete_array.shape)  # prints (350, 15111)

If you add a pad using numpy.pad, the padding specified at the parameter pad_width is applied at the edges of each axis. But you only want to pad the axis=1.

To clarify:

import numpy as np
array = np.ones((2, 2))
new_array = np.pad(array, 2)
print(new_array.shape)  # produces (6, 6) which is not what you want.