From the following Multi-index dataframe:
a b c
0 0 0 42 65
1 6 0 340
2 5 71 800
3 2 51 409
4 0 23 279
5 8 38 549
1 0 1 23 252
1 9 13 977
2 1 19 943
2 0 2 23 295
1 3 39 458
2 1 62 308
3 0 95 954
4 9 78 535
3 0 4 67 849
1 3 46 761
2 7 49 485
3 0 44 638
How can I extract the rows from the dataframe that matches the numbers on a Numpy array? For instance, if my array is:
a = np.array([2, 2, 4, 3])
The result should be a dataframe like:
a b c
0 5 71 800
1 1 19 943
2 9 78 535
3 0 44 638
I have tried the following:
i,j = df.index.levels
ix = a
df1 = pd.DataFrame(df.to_numpy()[ix])
but that's giving me the wrong result. The dataframe I'm currently getting is:
a b c
0 5 71 800
1 5 71 800
2 0 23 279
3 2 51 409
It's actually reading the index from df1 instead of df.
CodePudding user response:
If you want to use one number for each level you could use direct slicing:
a = np.array([2, 2, 4, 3])
b = df.index.get_level_values(0).unique().to_numpy() # to_numpy() is optional
# b = array([0, 1, 2, 3])
df.loc[zip(b,a)]
output:
a b c
0 2 5 71 800
1 2 1 19 943
2 4 9 78 535
3 3 0 44 638
If you want to handle potentially incorrect data, use reindex
:
a = np.array([2, 5, 0, 2])
# b = df.index.get_level_values(0).unique()
df.reindex(zip(b,a))
output:
a b c
0 2 5.0 71.0 800.0
1 5 NaN NaN NaN
2 0 2.0 23.0 295.0
3 2 7.0 49.0 485.0