Home > Software design >  padding one numpy array to achieve the same number os columns of another numpy array
padding one numpy array to achieve the same number os columns of another numpy array

Time:08-24

suppose I have two numpy arrays of different shapes.

np array 1 shape (300, 15111)

np array 2 shape ( 50, 10465)

I want to pad np array 2 so that it matches 15111. I want to do this because latter on I want to concatenate these two arrays. So, my final array would be of shape (350, 15111), i.e., containing the 300 instances from np array 1 50 instances from np array 2 with the same number of "columns".

I am trying to do the following:

raw_inputs = [np array 1, np array 2]

padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs, 
padding="post")

print(padded_inputs)

But I am in the wrong direction because I am getting the error below:


ValueError                                Traceback (most recent call last)
<ipython-input-108-6d06055c9929> in <module>
      2 
      3 padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs, 
----> 4 padding="post")
      5 
      6 print(padded_inputs)

1 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/sequence.py in 
pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
    100             raise ValueError('Shape of sample %s of sequence at position %s '
    101                              'is different from expected shape %s' %
--> 102                              (trunc.shape[1:], idx, sample_shape))
    103 
    104         if padding == 'post':

ValueError: Shape of sample (10465,) of sequence at position 1 is different from 
expected shape (15111,)

In addition, I don't know how to concatenate these two np arrays once they have the same size.

Any help I would really appreaciate!

CodePudding user response:

Maybe try something like this:

import tensorflow as tf

array1 = tf.random.normal((300, 15111))
array2 = tf.random.normal((50, 10465))

difference = tf.shape(array1)[-1] - tf.shape(array2)[-1]

# post padding
array2 = tf.concat([array2, tf.zeros((50, difference))], axis=-1) 

final_array = tf.concat([array1, array2], axis=0)
final_array.shape
# TensorShape([350, 15111])

The logic is exactly the same with numpy:

import numpy as np

array1 = np.random.random((300, 15111))
array2 = np.random.random((50, 10465))

difference = array1.shape[-1] - array2.shape[-1]

array2 = np.concatenate([array2, np.zeros((50, difference))], axis=-1)

final_array = np.concatenate([array1, array2], axis=0)
final_array.shape

Or with tf.keras.preprocessing.sequence.pad_sequences:

import tensorflow as tf

array1 = tf.random.normal((300, 15111))
array2 = tf.random.normal((50, 10465))

array2 = tf.keras.preprocessing.sequence.pad_sequences(array2, maxlen=tf.shape(array1)[-1])
final_array = tf.concat([array1, array2], axis=0)
final_array.shape

CodePudding user response:

You could use numpy.concatenate instead.

import numpy as np
small_array = np.ones((50, 10465))
big_array = np.ones((300, 15111))

# create the padding to concatenate
pad = np.zeros((small_array.shape[0], big_array.shape[1]-small_array.shape[1]))

new_array =  np.concatenate((small_array, pad), axis=1)
print(new_array.shape)  # prints (50, 15111)

complete_array = np.concatenate((big_array, new_array), axis=0)
print(complete_array.shape)  # prints (350, 15111)

If you add a pad using numpy.pad, the padding specified at the parameter pad_width is applied at the edges of each axis. But you only want to pad the axis=1.

To clarify:

import numpy as np
array = np.ones((2, 2))
new_array = np.pad(array, 2)
print(new_array.shape)  # produces (6, 6) which is not what you want.
  • Related