I have a pandas dataframe with a column called 'corr'. Each row contains an ndarray of float64. The following code is giving me issues:
import pandas as pd
experimentDataFrame = pd.DataFrame({'corr': [np.array([1.0,2.0]),np.array([3.0,4.0]),np.array([5.0,6.0])]})
corr = experimentDataFrame['corr'].to_numpy(copy=True)
print ([type(corr), corr.shape])
print ([type(corr[0]), corr[0].shape])
print ([type(corr[0][0]), corr[0][0].shape])
corr = corr.flatten()
print ([type(corr), corr.shape])
print ([type(corr[0]), corr[0].shape])
print ([type(corr[0][0]), corr[0][0].shape])
The output of which is
[<class 'numpy.ndarray'>, (3,)]
[<class 'numpy.ndarray'>, (2,)]
[<class 'numpy.float64'>, ()]
[<class 'numpy.ndarray'>, (3,)]
[<class 'numpy.ndarray'>, (2,)]
[<class 'numpy.float64'>, ()]
I've also tried corr.ravel()
and corr.reshape(-1)
instead of flatten
with no difference. And I've tried corr.reshape(6) but I get, ValueError: cannot reshape array of size 35 into shape (6,)
.
What I'm expecting is that after flattening, corr[0]
should be a float64 and not still an ndarray. My strong suspicion is that since corr is an ndarray of ndarrays of unknown length, flatten (and the rest) doesn't work. Is there a function that will work without iterating manually?
CodePudding user response:
The problem is that experimentDataFrame['corr'].to_numpy(copy=True)
is already flat, the shape is (35,)
. You have a dtype=object
array.
You just want something like:
corr = np.concatenate([arr.ravel() for arr in experimentDataFrame['corr']])
Possibly, you can just do:
corr = np.concatenate(experimentDataFrame['corr'].tolist())
If all the inner arrays in your column are already flat. It isn't clear that is the case from your question, but either of those should work.
EDIT:
And actually, you don't need .tolist
, just:
corr = np.concatenate(experimentDataFrame['corr'])
works.