Home > Enterprise >  Extract rows from Multi-index dataframe where child level matches numbers on Numpy array
Extract rows from Multi-index dataframe where child level matches numbers on Numpy array

Time:07-02

From the following Multi-index dataframe:

        a   b   c
0   0   0   42  65
    1   6   0   340
    2   5   71  800
    3   2   51  409
    4   0   23  279
    5   8   38  549
1   0   1   23  252
    1   9   13  977
    2   1   19  943
2   0   2   23  295
    1   3   39  458
    2   1   62  308
    3   0   95  954
    4   9   78  535
3   0   4   67  849
    1   3   46  761
    2   7   49  485
    3   0   44  638

How can I extract the rows from the dataframe that matches the numbers on a Numpy array? For instance, if my array is:

a = np.array([2, 2, 4, 3])

The result should be a dataframe like:

    a   b   c
0   5   71  800
1   1   19  943
2   9   78  535
3   0   44  638

I have tried the following:

i,j = df.index.levels
ix = a
df1 = pd.DataFrame(df.to_numpy()[ix])

but that's giving me the wrong result. The dataframe I'm currently getting is:

    a   b   c
0   5   71  800
1   5   71  800
2   0   23  279
3   2   51  409

It's actually reading the index from df1 instead of df.

CodePudding user response:

If you want to use one number for each level you could use direct slicing:

a = np.array([2, 2, 4, 3])
b = df.index.get_level_values(0).unique().to_numpy() # to_numpy() is optional
# b = array([0, 1, 2, 3])

df.loc[zip(b,a)]

output:

     a   b    c
0 2  5  71  800
1 2  1  19  943
2 4  9  78  535
3 3  0  44  638

If you want to handle potentially incorrect data, use reindex:

a = np.array([2, 5, 0, 2])
# b = df.index.get_level_values(0).unique()

df.reindex(zip(b,a))

output:

       a     b      c
0 2  5.0  71.0  800.0
1 5  NaN   NaN    NaN
2 0  2.0  23.0  295.0
3 2  7.0  49.0  485.0
  • Related