Is it possible in numpy array to add rows with different length and then add elements to that rows i-CodePudding

Python Version: 3.7.11
numpy Version: 1.21.2

I want to have a numpy array, something like below:

[
    ["Hi", "Anne"],
    ["How", "are", "you"],
    ["fine"]
]

But the process of creating this numpy array is not simple and it's as follows:

# code block 1 At the beginning we have an empty numpy array.

First loop:
# code block 2 row is added in this first loop or

in this loop we understand that we need a new row.

A loop inside of the first loop:
# code block 3 elements of that row will be added in this inner loop.

Assume that:

the number of iterations is not specified, I mean:
- the number of columns of each row is different and
- we don't know the number of rows that we want to add to numpy array.

Maybe bellow code example will help me get my point across:

a = [["Hi", "Anne"], ["How", "are", "you"], ["fine"]]

# code block 1: code for creating empty numpy array

for row in a:
    # code block 2: code for creating empty row
    
    for element in row:
        # code block 3: code for appending element to that row or last row

Question:

Is it possible to create a numpy array with these steps (code block #1, #2, #3)?

If yes, how?

CodePudding user response：

Numpy arrays are not optimised for inconsistent dimensions, and therefore not good practice. You can only do this by making your elements objects, not strings. But like I said, numpy is not the way to go for this.

a = numpy.array([["Hi", "Anne"], ["How", "are", "you"], ["fine"]], dtype=object)

CodePudding user response：

Start with the nested list:

In [99]: alist = [
    ...:     ["Hi", "Anne"],
    ...:     ["How", "are", "you"],
    ...:     ["fine"]
    ...: ]
In [100]: alist
Out[100]: [['Hi', 'Anne'], ['How', 'are', 'you'], ['fine']]

Make an array from it:

In [101]: arr = np.array(alist)
<ipython-input-101-3fd8e9bd05a9>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = np.array(alist)
In [102]: arr
Out[102]: 
array([list(['Hi', 'Anne']), list(['How', 'are', 'you']), list(['fine'])],
      dtype=object)

That warning tells us that we are doing something unusual, or at least suboptimal. We can suppress the warning with dtype=object, but the result will be the same.

Look at the result - it's a 3 element array, where it each element is a list. It's not a multidimensional array. We can make an array of arrays

In [103]: arr1 = np.array([np.array(el) for el in arr], object)
In [104]: arr1
Out[104]: 
array([array(['Hi', 'Anne'], dtype='<U4'),
       array(['How', 'are', 'you'], dtype='<U3'),
       array(['fine'], dtype='<U4')], dtype=object)

Sounds like you want to duplicate this list constructor with arrays:

In [107]: al = []

     ...: for row in alist:
     ...:     al1 = []
     ...:     for el in row:
     ...:         al1.append(el)
     ...:     al.append(al1)
     ...: 
In [108]: al
Out[108]: [['Hi', 'Anne'], ['How', 'are', 'you'], ['fine']]

But there are several problems.

There isn't a simple "empty" array; arrays can have a shape like (0,) or (0,3) or (3,0,3) etc.

Arrays don't have a simple and fast append. np.append does not qualify. Any attempt to "grow" an array results in making a new array with a full copy. List append just adds a pointer to an object that's designed to grow.

While numpy can make string dtype arrays (as in [104]), it does not have special string handling code. You must still use python string methods to manipulate those strings.

Math on object dtype arrays is hit-or-miss and slower than math on numeric arrays. Essentially it takes place at list-comprehension speeds.

numpy is designed for fast numerical work on multidimensional arrays. Think things like matrix multiplication, addition. Even when used as a stepping stone to machine learning the arrays need to be numeric and "rectangular". Ragged lists cannot be used for ML.