I have a pd.multi-index with 2 levels like so
var_0 var_1
instances timepoints
0 1 1 4
2 2 5
3 3 6
4 5 8
1 1 1 4
2 2 55
3 3 6
4 3 6
2 1 1 42
2 2 5
3 3 6
What I am trying to do is convert it to a 3-dimensional NumPy array with shape (n_instances, n_columns, n_timepoints).
I have attempted to reshape using the values of the instances but this is quite a bit step up for me in terms of technicality and I'm quite stuck.
Unique_Cases = df_train.index.levels[0]
print(Unique_Cases)
D = [df_train.loc[instances].values for instances in Unique_Cases]
print(np.array(D,dtype=object).shape)
CodePudding user response:
The shape of your dataframe is not the same as your desired numpy array. So let's transform it first:
# unstack() swings `timepoints`` from vertical to horizontal.
# stack(level=0) swings the var_* columns from horizontal to vertical
tmp = df.unstack().stack(level=0)
# tmp:
timepoints 1 2 3 4
instances
0 var_0 1 2 3 5.0
var_1 4 5 6 8.0
1 var_0 1 2 3 3.0
var_1 4 55 6 6.0
2 var_0 1 2 3 NaN
var_1 42 5 6 NaN
Now you can slice the dataframe to get the array you want:
arr = np.array(
[tmp.xs(i).to_numpy() for i in df.index.unique("instances")]
)
# arr
array([[[ 1., 2., 3., 5.],
[ 4., 5., 6., 8.]],
[[ 1., 2., 3., 3.],
[ 4., 55., 6., 6.]],
[[ 1., 2., 3., nan],
[42., 5., 6., nan]]])
# arr.shape
(3, 2, 4)