Convert a list of astropy Table in a numpy array of astropy Table-CodePudding

I try to convert a list of astropy Table in a numpy array of astropy Table. In first instance I tried np.asarray(list) and np.array(list) but the astropy table inside the list were converted with the list as numpy ndarray.

Example :

t = Table({'a': [1,2,3], 'b':[4,5,6]})  
t2 = Table({'a': [7,8,9], 'b':[10,11,12]})
mylist = [t1, t2]
print(mylist)

The output is:

[<Table length=3>
  a     b
int64 int64
----- -----
    1     4
    2     5
    3     6, 
<Table length=3>
  a     b
int64 int64
----- -----
    7    10
    8    11
    9    12]

Then if I apply np.array() the output is :

array([[(1,  4), (2,  5), (3,  6)],
       [(7, 10), (8, 11), (9, 12)]], dtype=[('a', '<i8'), ('b', '<i8')])

but I want the following:

array([<Table length=3>
  a     b
int64 int64
----- -----
    1     4
    2     5
    3     6, 
<Table length=3>
  a     b
int64 int64
----- -----
    7    10
    8    11
    9    12])

My actual solution is :

if isinstance(mylist, list):
    myarray = np.empty(len(mylist), dtype='object')
    for i in range(len(myarray)):
        myarray[i] = mylist[i]
else:
    myarray = mylist
return myarray

It works but I was thinking that there is maybe something built-in in numpy to do this, but I can't find it.

CodePudding user response：

This looks to be an Astropy Table limitation, which I would consider a bug: Astropy's Table will prevent coercion to a NumPy array, since that doesn't always work: there is a specific check in the code that will raise a ValueError if there is a dtype specified when attempting to convert a table to a NumPy array.

Of course, here you are dealing with a list. But now you run into two issues: NumPy will attempt to convert the list to an array, and apply transformation of each individual element. You either get a 2D array with no dtype specified, or again, the ValueError with dtype specified:

ValueError: Datatype coercion is not allowed

The bug (as I consider it) is that Astropy checks for a dtype anything other than None. So even object as a dtype will raise this error, which I'm not sure it should.

Your work-around is therefore, in my opinion, fine. Not ideal, but it does the job, and it's basically just 2-3 lines of code.

Since, however, you mention boolean indexing, consider the following, while keeping everything in a list (which I think here is the better option: NumPy arrays are really meant for numbers, not so much objects):

indices = [True, False, True, False]
my_list = [....]  # list of tables
selection = [item for item, index in zip(my_list, indices) if index]  # filter all True values

or for numbered indices:

indices = [1, 3, 5, 6]
my_list = [....] # list of tables
selection = [my_list[i] for i in indices]

Same amount of lines as with NumPy indexing, and unless your list grows to thousands (millions) of elements, you wouldn't notice a performance difference. (If it does grow to millions of elements, you may need to reconsider your data structures anyway, which requires more rewriting elsewhere in your code.)