Retrieving initial lists used for creating a Numpy array-CodePudding

Lets say one has a numpy array generated from lists

import numpy as np


a1 = [1,2,3,4]
a2 = [11,22,33,44]
a3 = [111,222,333,444]
a4 = [1111,2222,3333,4444]

a = []
for x in a1:
 for y in a2:
  for k in a3:
   for l in a4:
        a.append((x, y, k, l))


na = np.array(a)

Now the goal is to retrieve these initial lists from this 2D numpy array. One solution is

na.shape = (4,4,4,4,4)

a1 = na[:,0,0,0,0]
a2 = na[0,:,0,0,1]
a3 = na[0,0,:,0,2]
a4 = na[0,0,0,:,3]

print(a1)
print(a2)
print(a3)
print(a4)

[1 2 3 4]
[11 22 33 44]
[111 222 333 444]
[1111 2222 3333 4444]

This is perfectly fine and my first choice. I'm simply wondering if there's also a fancy way of doing this, thanks

CodePudding user response：

If the values in each original array are always unique you could use numpy's "unique" to find unique values in each column like this:

#--- your code
import numpy as np

a1 = [1,2,3,4]
a2 = [11,22,33,44]
a3 = [111,222,333,444]
a4 = [1111,2222,3333,4444]

a = []
for x in a1:
 for y in a2:
  for k in a3:
   for l in a4:
        a.append((x, y, k, l))

na = np.array(a)

#--- suggested solution
original_arrays = [np.unique(column) for column in na.T]

>>> original_arrays

[array([1, 2, 3, 4]),
 array([11, 22, 33, 44]),
 array([111, 222, 333, 444]),
 array([1111, 2222, 3333, 4444])]

Details of the solution:

First we loop through the columns of the array using list comprehension to construct a list of our outputs (instead of creating an empty list and appending to it in a for loop)

columns = [column for column in na.T]

Now instead of just looping through the columns we find the unique values in each column using the numpy "unique" function.

original_arrays = [np.unique(column) for column in na.T]

And the result is a list of NumPy arrays containing the unique values in each column:

 >>> original_arrays

[array([1, 2, 3, 4]),
 array([11, 22, 33, 44]),
 array([111, 222, 333, 444]),
 array([1111, 2222, 3333, 4444])]

CodePudding user response：

The initial na and shape:

In [117]: na
Out[117]: 
array([[   1,   11,  111, 1111],
       [   1,   11,  111, 2222],
       [   1,   11,  111, 3333],
       ...,
       [   4,   44,  444, 2222],
       [   4,   44,  444, 3333],
       [   4,   44,  444, 4444]])
In [118]: na.shape
Out[118]: (256, 4)

Your indexing works with

naa=na.reshape(4,4,4,4,4)

Initially I missed the fact that you were using

na.shape = (4,4,4,4,4)

to do this reshape. (I use reshape far more often than the in-place reshape.)

The a# values appear in the respective columns, but with many repeats. You can skip those with the right slicing.

In [119]: na[:4,3]
Out[119]: array([1111, 2222, 3333, 4444])
In [122]: na[:16:4,2]
Out[122]: array([111, 222, 333, 444])
In [123]: na[:16*4:16,1]
Out[123]: array([11, 22, 33, 44])
In [124]: na[:16*4*4:16*4,0]
Out[124]: array([1, 2, 3, 4])

On the 5d version, your solution is probably as good as any. It's not a common arrangement of values, so it's unlikely that there will be a built-in shortcut.