Home > Net >  How to turn a pandas DataFrame of lists of numbers into a 3-dimensional array?
How to turn a pandas DataFrame of lists of numbers into a 3-dimensional array?

Time:04-28

I have a pandas DataFrame with a structure like this:

In [22]: df
Out[22]: 
           a             b
0  [1, 2, 3]     [4, 5, 6]
1  [7, 8, 9]  [10, 11, 12]

(to build it, do something like

df = pd.DataFrame([[object(), object()], [object(), object()]], columns=["a", "b"])
df.iat[0, 0] = [1, 2, 3]
df.iat[0, 1] = [4, 5, 6]
df.iat[1, 0] = [7, 8, 9]
df.iat[1, 1] = [10, 11, 12]

What would be the simplest way to turn it into a NumPy 3-dimensional array? This would be the expected result:

In [20]: arr
Out[20]: 
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [21]: arr.shape
Out[21]: (2, 2, 3)

In [22]: df.iloc[0, 0]
Out[22]: [1, 2, 3]

In [23]: arr[0, 0]
Out[23]: array([1, 2, 3])

In [24]: df.iloc[-1]
Out[24]: 
a       [7, 8, 9]
b    [10, 11, 12]
Name: 1, dtype: object

In [25]: arr[-1]
Out[25]: 
array([[ 7,  8,  9],
       [10, 11, 12]])

I have tried several things, without success:

In [6]: df.values  # Notice the dtype
Out[6]: 
array([[list([1, 2, 3]), list([4, 5, 6])],
       [list([7, 8, 9]), list([10, 11, 12])]], dtype=object)

In [7]: df.values.astype(int)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 df.values.astype(int)

ValueError: setting an array element with a sequence.

In [14]: df.values.reshape(2, 2, -1)
Out[14]: 
array([[[list([1, 2, 3])],
        [list([4, 5, 6])]],

       [[list([7, 8, 9])],
        [list([10, 11, 12])]]], dtype=object)

CodePudding user response:

One option is to convert df to a list; then cast to numpy array:

out = np.array(df.to_numpy().tolist())

Output:

>>> out
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

>>> out.shape
(2, 2, 3)

>>> out[0,0]
array([1, 2, 3])

>>> out[-1]
array([[ 7,  8,  9],
       [10, 11, 12]])
  • Related