I have a pandas DataFrame with a structure like this:
In [22]: df
Out[22]:
a b
0 [1, 2, 3] [4, 5, 6]
1 [7, 8, 9] [10, 11, 12]
(to build it, do something like
df = pd.DataFrame([[object(), object()], [object(), object()]], columns=["a", "b"])
df.iat[0, 0] = [1, 2, 3]
df.iat[0, 1] = [4, 5, 6]
df.iat[1, 0] = [7, 8, 9]
df.iat[1, 1] = [10, 11, 12]
What would be the simplest way to turn it into a NumPy 3-dimensional array? This would be the expected result:
In [20]: arr
Out[20]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
In [21]: arr.shape
Out[21]: (2, 2, 3)
In [22]: df.iloc[0, 0]
Out[22]: [1, 2, 3]
In [23]: arr[0, 0]
Out[23]: array([1, 2, 3])
In [24]: df.iloc[-1]
Out[24]:
a [7, 8, 9]
b [10, 11, 12]
Name: 1, dtype: object
In [25]: arr[-1]
Out[25]:
array([[ 7, 8, 9],
[10, 11, 12]])
I have tried several things, without success:
In [6]: df.values # Notice the dtype
Out[6]:
array([[list([1, 2, 3]), list([4, 5, 6])],
[list([7, 8, 9]), list([10, 11, 12])]], dtype=object)
In [7]: df.values.astype(int)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 df.values.astype(int)
ValueError: setting an array element with a sequence.
In [14]: df.values.reshape(2, 2, -1)
Out[14]:
array([[[list([1, 2, 3])],
[list([4, 5, 6])]],
[[list([7, 8, 9])],
[list([10, 11, 12])]]], dtype=object)
CodePudding user response:
One option is to convert df
to a list; then cast to numpy array:
out = np.array(df.to_numpy().tolist())
Output:
>>> out
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
>>> out.shape
(2, 2, 3)
>>> out[0,0]
array([1, 2, 3])
>>> out[-1]
array([[ 7, 8, 9],
[10, 11, 12]])