converting a multi-index into 3d numpy array-CodePudding

I have a pd.multi-index with 2 levels like so

                      var_0  var_1
instances timepoints              
0         1               1      4
          2               2      5
          3               3      6
          4               5      8
1         1               1      4
          2               2     55
          3               3      6
          4               3      6
2         1               1     42
          2               2      5
          3               3      6

What I am trying to do is convert it to a 3-dimensional NumPy array with shape (n_instances, n_columns, n_timepoints).

I have attempted to reshape using the values of the instances but this is quite a bit step up for me in terms of technicality and I'm quite stuck.

    Unique_Cases = df_train.index.levels[0]
    print(Unique_Cases)
    D = [df_train.loc[instances].values for instances in Unique_Cases]
    print(np.array(D,dtype=object).shape)

CodePudding user response：

The shape of your dataframe is not the same as your desired numpy array. So let's transform it first:

# unstack() swings `timepoints`` from vertical to horizontal.
# stack(level=0) swings the var_* columns from horizontal to vertical
tmp = df.unstack().stack(level=0)

# tmp:
timepoints        1   2  3    4
instances                      
0         var_0   1   2  3  5.0
          var_1   4   5  6  8.0
1         var_0   1   2  3  3.0
          var_1   4  55  6  6.0
2         var_0   1   2  3  NaN
          var_1  42   5  6  NaN

Now you can slice the dataframe to get the array you want:

arr = np.array(
    [tmp.xs(i).to_numpy() for i in df.index.unique("instances")]
)

# arr
array([[[ 1.,  2.,  3.,  5.],
        [ 4.,  5.,  6.,  8.]],

       [[ 1.,  2.,  3.,  3.],
        [ 4., 55.,  6.,  6.]],

       [[ 1.,  2.,  3., nan],
        [42.,  5.,  6., nan]]])

# arr.shape
(3, 2, 4)