I try to convert a list of astropy Table in a numpy array of astropy Table. In first instance I tried np.asarray(list)
and np.array(list)
but the astropy table inside the list were converted with the list as numpy ndarray.
Example :
t = Table({'a': [1,2,3], 'b':[4,5,6]})
t2 = Table({'a': [7,8,9], 'b':[10,11,12]})
mylist = [t1, t2]
print(mylist)
The output is:
[<Table length=3>
a b
int64 int64
----- -----
1 4
2 5
3 6,
<Table length=3>
a b
int64 int64
----- -----
7 10
8 11
9 12]
Then if I apply np.array()
the output is :
array([[(1, 4), (2, 5), (3, 6)],
[(7, 10), (8, 11), (9, 12)]], dtype=[('a', '<i8'), ('b', '<i8')])
but I want the following:
array([<Table length=3>
a b
int64 int64
----- -----
1 4
2 5
3 6,
<Table length=3>
a b
int64 int64
----- -----
7 10
8 11
9 12])
My actual solution is :
if isinstance(mylist, list):
myarray = np.empty(len(mylist), dtype='object')
for i in range(len(myarray)):
myarray[i] = mylist[i]
else:
myarray = mylist
return myarray
It works but I was thinking that there is maybe something built-in in numpy to do this, but I can't find it.
CodePudding user response:
This looks to be an Astropy Table limitation, which I would consider a bug: Astropy's Table will prevent coercion to a NumPy array, since that doesn't always work: there is a specific check in the code that will raise a ValueError
if there is a dtype
specified when attempting to convert a table to a NumPy array.
Of course, here you are dealing with a list. But now you run into two issues: NumPy will attempt to convert the list to an array, and apply transformation of each individual element. You either get a 2D array with no dtype
specified, or again, the ValueError
with dtype
specified:
ValueError: Datatype coercion is not allowed
The bug (as I consider it) is that Astropy checks for a dtype
anything other than None
. So even object
as a dtype will raise this error, which I'm not sure it should.
Your work-around is therefore, in my opinion, fine. Not ideal, but it does the job, and it's basically just 2-3 lines of code.
Since, however, you mention boolean indexing, consider the following, while keeping everything in a list (which I think here is the better option: NumPy arrays are really meant for numbers, not so much objects):
indices = [True, False, True, False]
my_list = [....] # list of tables
selection = [item for item, index in zip(my_list, indices) if index] # filter all True values
or for numbered indices:
indices = [1, 3, 5, 6]
my_list = [....] # list of tables
selection = [my_list[i] for i in indices]
Same amount of lines as with NumPy indexing, and unless your list grows to thousands (millions) of elements, you wouldn't notice a performance difference. (If it does grow to millions of elements, you may need to reconsider your data structures anyway, which requires more rewriting elsewhere in your code.)