append an element to 2d numpy array-CodePudding

I have a numpy array that has a shape of (500, 151296). Below is the array format

array:

array([[-0.18510018,  0.13180602,  0.32903048, ...,  0.39744213,
        -0.01461623,  0.06420607],
       [-0.14988784,  0.12030973,  0.34801325, ...,  0.36962894,
         0.04133283,  0.04434045],
       [-0.3080041 ,  0.18728344,  0.36068922, ...,  0.09335024,
        -0.11459247,  0.10187756],
       ...,
       [-0.17399777, -0.02492459, -0.07236133, ...,  0.08901921,
        -0.17250113,  0.22222663],
       [-0.17399777, -0.02492459, -0.07236133, ...,  0.08901921,
        -0.17250113,  0.22222663],
       [-0.17399777, -0.02492459, -0.07236133, ...,  0.08901921,
        -0.17250113,  0.22222663]], dtype=float32)

array[0]:

array([-0.18510018,  0.13180602,  0.32903048, ...,  0.39744213,
       -0.01461623,  0.06420607], dtype=float32)

I have another list that has stopwords which are same size of the numpy array shape

stopwords = ['no', 'not', 'in' .........]

I want to add each stopword to the numpy array which has 500 elements. Below is the code that I am using to add

for i in range(len(stopwords)):
  array = np.append(array[i], str(stopwords[i]))

I am getting the below error

IndexError                                Traceback (most recent call last)
<ipython-input-45-361e2cf6519b> in <module>
      1 for i in range(len(stopwords)):
----> 2   array = np.append(array[i], str(stopwords[i]))

IndexError: index 2 is out of bounds for axis 0 with size 2

Desired output:

array[0]:

array([-0.18510018,  0.13180602,  0.32903048, ...,  0.39744213,
       -0.01461623,  0.06420607, 'no'], dtype=float32)

Can anyone tell me where am I doing wrong?

CodePudding user response：

What you are doing wrong is that you overwrite the variable array inside the for loop:

for i in range(len(stopwords)):
    array = np.append(array[i], str(stopwords[i]))
#   ^^^^^             ^^^^^

But what you are also doing wrong is to use np.append in a for loop, which is almost always a bad idea.

You could rather do something like:

from string import ascii_letters
from random import choices

import numpy as np

N, M = 50, 7
arr = np.random.randn(N, M)
stopwords = np.array(["".join(choices(ascii_letters, k=10)) for _ in range(N)])
result = np.concatenate([arr, stopwords[:, None]], axis=-1)

assert result.shape == (N, M 1)
print(result[0])  # ['0.1' '-1.2' '-0.1' '1.6' '-1.4' '-0.2' '1.7' 'ybWyFlqhcS']

But it is also wrong, mixing data types for no apparent reason.

Imho, you better just keep the two arrays.

Depending on what you are doing you can iterate over them as follows:

for vector, stopword in zip(arr, stopwords):
    print(f"{stopword = }")
    print(f"{vector   = }")

# stopword = 'RgfTVGzPOl'
# vector   = array([-0.9,  1.1,  0.7 , -0.3 , -0.7 , -0.7, -0.6])
# 
# stopword = 'XlJqKdsvCC'
# vector   = array([-0.5,  0.1, -0.7 , -0.6, -1.1, -0.6, -0.6])
# 
#...

CodePudding user response：

Let's try some debugging.

Start with a smaller float array:

In [76]: arr = np.arange(12).reshape(3,4).astype(float)    
In [77]: arr
Out[77]: 
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [78]: words = ['no','not','in']

In [79]: for i in range(3):
    ...:     arr = np.append(arr[i], str(words[i]))
    ...:     
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [79], in <cell line: 1>()
      1 for i in range(3):
----> 2     arr = np.append(arr[i], str(words[i]))

IndexError: index 2 is out of bounds for axis 0 with size 2

Look at i and arr when you get the error:

In [80]: arr
Out[80]: array(['1.0', 'not'], dtype='<U3')    
In [81]: i
Out[81]: 2

arr looks nothing like the original arr, does it? It's a 1d array with 2 string elements. It's arr[2] that's raising the error. Do you understand why?

Recreate arr, and perform just one step:

In [82]: arr = np.arange(12).reshape(3,4).astype(float)
In [83]: np.append(arr[0], words[0])
Out[83]: array(['0.0', '1.0', '2.0', '3.0', 'no'], dtype='<U32')

That looks a bit like what you want for the first row, except it is string dtype. But you don't want to replace the original arr with this 1d array, do you?

Doing the i=1 step on this result produces

In [84]: np.append(Out[83][1], words[1])
Out[84]: array(['1.0', 'not'], dtype='<U3')

Which is the array that i=2 is having problems with (a shape (2,) array).

Don't just throw up your hands in despair when you get an error - debug by looking at variables, and testing the code step by step.

The kind of iteration that you attempt does work for lists:

In [85]: alist = arr.tolist()  
In [86]: alist
Out[86]: [[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0], [8.0, 9.0, 10.0, 11.0]]

In [87]: for i in range(3):
    ...:     alist[i].append(words[i])
    ...:     

In [88]: alist
Out[88]: 
[[0.0, 1.0, 2.0, 3.0, 'no'],
 [4.0, 5.0, 6.0, 7.0, 'not'],
 [8.0, 9.0, 10.0, 11.0, 'in']]

The elements of a list can differ in length; list append works in-place; lists can contain numbers and strings. None of this holds true for numpy arrays.

As a general rule, trying to replicate list methods with numpy arrays does not work.