I have the following data frame.
test = {
"a": [[[1,2],[3,4]],[[1,2],[3,4]]],
"b": [[[1,2],[3,6]],[[1,2],[3,4]]]
}
df = pd.DataFrame(test)
df
a | b | |
---|---|---|
0 | [[1,2],[3,4]] | [[1,2],[3,6]] |
1 | [[1,2],[3,4]] | [[1,2],[3,4]] |
For example, I want to transform the first column to a numpy array with shape (2,2,2). If I use the following code, i will get a array with shape (2,) instead of (2,2,2)
df['a'].apply(np.asarray).values
How can I get the array with shape (2,2,2)?
CodePudding user response:
ah, stupid question. the following code works:
np.array(list(df['a']))
anyone has better solution? thx!
CodePudding user response:
When creating dataframes that contain lists or arrays in the columns, it's a good idea to have a clear sense what's stored.
In [545]: df
Out[545]:
a b
0 [[1, 2], [3, 4]] [[1, 2], [3, 6]]
1 [[1, 2], [3, 4]] [[1, 2], [3, 4]]
A frame is a 2d object, one column, a Series, is 1d.
to_numpy
returns an array (np.array(df)
and df.values
do the same):
In [546]: df.to_numpy()
Out[546]:
array([[list([[1, 2], [3, 4]]), list([[1, 2], [3, 6]])],
[list([[1, 2], [3, 4]]), list([[1, 2], [3, 4]])]], dtype=object)
It's 2d, but object dtype means it contains (references) lists. df.info()
also tells us that.
In [547]: df['a'].to_numpy()
Out[547]: array([list([[1, 2], [3, 4]]), list([[1, 2], [3, 4]])], dtype=object)
to_numpy of a column is 1d, again object dtype.
In [548]: df['a'].to_list()
Out[548]: [[[1, 2], [3, 4]], [[1, 2], [3, 4]]]
This is a pure (nested) lists. As with a hand written nested list, it can be turned into an array with:
In [550]: np.array(df['a'].to_list())
Out[550]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
For the array version you need to use stack
to combine them:
In [551]: np.stack(df['a'].to_numpy())
Out[551]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
A different concatenation method:
In [552]: np.vstack(df['a'].to_numpy())
Out[552]:
array([[1, 2],
[3, 4],
[1, 2],
[3, 4]])