- Python Version: 3.7.11
- numpy Version: 1.21.2
I want to have a numpy array, something like below:
[
["Hi", "Anne"],
["How", "are", "you"],
["fine"]
]
But the process of creating this numpy array is not simple and it's as follows:
# code block 1
At the beginning we have an empty numpy array.First loop:
# code block 2
row is added in this first loop orin this loop we understand that we need a new row.
A loop inside of the first loop:
# code block 3
elements of that row will be added in this inner loop.
Assume that:
the number of iterations is not specified, I mean:
the number of columns of each row is different and
we don't know the number of rows that we want to add to numpy array.
Maybe bellow code example will help me get my point across:
a = [["Hi", "Anne"], ["How", "are", "you"], ["fine"]]
# code block 1: code for creating empty numpy array
for row in a:
# code block 2: code for creating empty row
for element in row:
# code block 3: code for appending element to that row or last row
Question:
Is it possible to create a numpy array with these steps (
code block #1, #2, #3
)?If yes, how?
CodePudding user response:
Numpy arrays are not optimised for inconsistent dimensions, and therefore not good practice. You can only do this by making your elements objects, not strings. But like I said, numpy is not the way to go for this.
a = numpy.array([["Hi", "Anne"], ["How", "are", "you"], ["fine"]], dtype=object)
CodePudding user response:
Start with the nested list:
In [99]: alist = [
...: ["Hi", "Anne"],
...: ["How", "are", "you"],
...: ["fine"]
...: ]
In [100]: alist
Out[100]: [['Hi', 'Anne'], ['How', 'are', 'you'], ['fine']]
Make an array from it:
In [101]: arr = np.array(alist)
<ipython-input-101-3fd8e9bd05a9>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
arr = np.array(alist)
In [102]: arr
Out[102]:
array([list(['Hi', 'Anne']), list(['How', 'are', 'you']), list(['fine'])],
dtype=object)
That warning tells us that we are doing something unusual, or at least suboptimal. We can suppress the warning with dtype=object
, but the result will be the same.
Look at the result - it's a 3 element array, where it each element is a list. It's not a multidimensional array. We can make an array of arrays
In [103]: arr1 = np.array([np.array(el) for el in arr], object)
In [104]: arr1
Out[104]:
array([array(['Hi', 'Anne'], dtype='<U4'),
array(['How', 'are', 'you'], dtype='<U3'),
array(['fine'], dtype='<U4')], dtype=object)
Sounds like you want to duplicate this list constructor with arrays:
In [107]: al = []
...: for row in alist:
...: al1 = []
...: for el in row:
...: al1.append(el)
...: al.append(al1)
...:
In [108]: al
Out[108]: [['Hi', 'Anne'], ['How', 'are', 'you'], ['fine']]
But there are several problems.
There isn't a simple "empty" array; arrays can have a shape like (0,) or (0,3) or (3,0,3) etc.
Arrays don't have a simple and fast append
. np.append
does not qualify. Any attempt to "grow" an array results in making a new array with a full copy. List append just adds a pointer to an object that's designed to grow.
While numpy
can make string dtype arrays (as in [104]), it does not have special string handling code. You must still use python string methods to manipulate those strings.
Math on object dtype arrays is hit-or-miss and slower than math on numeric arrays. Essentially it takes place at list-comprehension speeds.
numpy
is designed for fast numerical work on multidimensional arrays. Think things like matrix multiplication, addition. Even when used as a stepping stone to machine learning the arrays need to be numeric and "rectangular". Ragged lists cannot be used for ML.