Home > Back-end >  Find index pair with fewest number of entries in Multiindex data frame
Find index pair with fewest number of entries in Multiindex data frame

Time:11-24

Given a pandas data frame with a 3 (or more) level index. e.g.:

             foo1 foo2 ...
L1  L2  L3
A   a   1    ...  ...
        2    ...  ...
    b   3    ...  ...
B   a   2    ...  ...
        2    ...  ...
        1    ...  ...
C   a   5    ...  ...
        3    ...  ...
    c   0    ...  ...
        9    ...  ...

I want to find the pair L1 & L2 with the fewest number of entries, i.e. (A,b) in above example because it only has one entry: (A,b,3)

My idea was to perform some kind of aggregation on L1 and L2 but I don't think you can make it work on indices. Of course I could access the index values and then loop over them to count entries but I feel that there is a more pandas-ish and more elegant solution.

CodePudding user response:

Using groupby size you can get the number of rows per group:

df.groupby(level=['L1','L2']).size()

Output:

L1  L2
A   a     2
    b     1
B   a     3
C   a     2
    c     2

If you add idxmin, you can get identify the group with the least number of rows:

df.groupby(level=['L1','L2']).size().idxmin()

Output: ('A', 'b')

  • Related