I need an array that holds either an array of records or a list of records. I need something like this:
people = [['David','Blue','Dog','Car'],['Sally','Yellow','Cat','Boat']]
or:
people = (['David','Blue','Dog','Car'],['Sally','Yellow','Cat','Boat'])
But I keep getting:
people = ['David','Blue','Dog','Car','Sally','Yellow','Cat','Boat']
I've tried append vs concatenate, different axis, different np initialization, but the results is always the same. Here is my latest version. What am I doing wrong?
import numpy as np
# Tried
# people = np.empty((0,0), dtype='S')
# people = np.array([[]])
people = np.array([])
records = GetRecordsFromDB()
for record in records:
# Do some stuff
# Tried
# person = [name, color, animal, vehicle]
person = np.array([name, color, animal, vehicle])
# Tried this with different axis
# people = np.append(people, person, axis=0)
people = np.concatenate((people, person))
Thank you.
EDIT: This will be the input for a Pandas DataFrame if that helps.
CodePudding user response:
Use np.c_
people = np.c_[people, person]
CodePudding user response:
In [354]: alist = [['David','Blue','Dog','Car'],['Sally','Yellow','Cat','Boat']]
...:
In [355]: alist
Out[355]: [['David', 'Blue', 'Dog', 'Car'], ['Sally', 'Yellow', 'Cat', 'Boat']]
In [356]: np.array(alist)
Out[356]:
array([['David', 'Blue', 'Dog', 'Car'],
['Sally', 'Yellow', 'Cat', 'Boat']], dtype='<U6')
That makes a 2d array of strings. The construction is no different from the textbook example of making a 2d numeric array:
In [358]: np.array([[1, 2], [3, 4]])
Out[358]:
array([[1, 2],
[3, 4]])
With hstack
or concatenate
:
In [359]: np.hstack(alist)
Out[359]:
array(['David', 'Blue', 'Dog', 'Car', 'Sally', 'Yellow', 'Cat', 'Boat'],
dtype='<U6')
To make an array with just 2 lists, you have to initial one:
In [360]: arr = np.empty(2, object)
In [361]: arr
Out[361]: array([None, None], dtype=object)
In [362]: arr[:] = alist
In [363]: arr
Out[363]:
array([list(['David', 'Blue', 'Dog', 'Car']),
list(['Sally', 'Yellow', 'Cat', 'Boat'])], dtype=object)
If the lists differ in length,
In [364]: np.array([["David", "Blue"], ["Sally", "Yellow", "Cat"]])
<ipython-input-364-be12d6dec312>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
np.array([["David", "Blue"], ["Sally", "Yellow", "Cat"]])
Out[364]:
array([list(['David', 'Blue']), list(['Sally', 'Yellow', 'Cat'])],
dtype=object)
Read that warning - in full.
By default np.array
tries to make a multidimensional array of base classes like int or string. It's only when it can't that it falls back on making an object dtype array. That kind of arrays should be viewed as second-class array, and used only when it really is needed. Often the list of lists is just as good.
Your iterative creation is one such case
people = []
records = GetRecordsFromDB()
for record in records:
# Do some stuff
# Tried
# person = [name, color, animal, vehicle]
person = np.array([name, color, animal, vehicle])
people = append(person)
Items, whether lists or array (or anything else) can be added to a list in-place with just the addition of a reference. Trying use concatenate
to add items of an array is, not only harder to get right, but slower, since it is making a whole new array each time. That means a lot of copying!
np.append
is a badly named way of calling concatenate
. It is not a list.append
clone.
Using np.concatenate
requires a careful handling of dimensions. Don't be sloppy, thinking it will figure out what you want.
Similarly this is not a close of list []
:
In [365]: np.array([])
Out[365]: array([], dtype=float64)
In [366]: np.array([]).shape
Out[366]: (0,)
It is a 1d array with a specific shape. You can only concatenate it with another 1d array - one the only axis, 0).
CodePudding user response:
Here is how I solved this:
import numpy as np
people = np.array([])
records = GetRecordsFromDB()
for record in records:
# Do some stuff
person = np.array([name, color, animal, vehicle])
if len(people) == 0:
people = [person]
else:
people = np.append(people, [person], axis=0)