Home > Software design >  Retain pandas multiindex after function across level
Retain pandas multiindex after function across level

Time:02-23

I'm looking to find a minimum value across level 1 of a multiindex, time in this example. But I'd like to retain all other labels of the index.

import numpy as np
import pandas as pd

stack = [
        [0, 1, 1, 5],
        [0, 1, 2, 6],
        [0, 1, 3, 2],
        [0, 2, 3, 4],
        [0, 2, 2, 5],
        [0, 3, 2, 1],
        [1, 1, 0, 5],
        [1, 1, 2, 6],
        [1, 1, 3, 7],
        [1, 2, 2, 8],
        [1, 2, 3, 9],
        [2, 1, 7, 1],
        [2, 1, 8, 3],
        [2, 2, 3, 4],
        [2, 2, 8, 1],
        ]

df = pd.DataFrame(stack)
df.columns = ['self', 'time', 'other', 'value']
df.set_index(['self', 'time', 'other'], inplace=True)

df.groupby(level=1).min() doesn't return the correct values:

      value
time       
1         1
2         1
3         1

doing something like df.groupby(level=[0,1,2]).min() returns the original dataframe unchanged.

I swear I used to be able to do this by calling .min(level=1) but it's giving me deprecation notices and teling me to use the above groupby format, but the result seems different than I remember, am I stupid?

original:

                 value
self time other       
0    1    1          5
          2          6
          3          2    #<-- min row
     2    3          4    #<-- min row
          2          5
     3    2          1    #<-- min row
1    1    0          5    #<-- min row
          2          6
          3          7
     2    2          8    #<-- min row
          3          9
2    1    7          1    #<-- min row
          8          3
     2    3          4
          8          1    #<-- min row

desired result:

                 value
self time other       
0    1    3          2
     2    3          4
     3    2          1
1    1    0          5
     2    2          8
2    1    7          1
     2    8          1

CodePudding user response:

Group by your 2 first levels then return the idxmin instead of min to get all indexes. Finally, use loc to filter out your original dataframe:

out = df.loc[df.groupby(level=['self', 'time'])['value'].idxmin()]
print(out)

# Output
                 value
self time other       
0    1    3          2
     2    3          4
     3    2          1
1    1    0          5
     2    2          8
2    1    7          1
     2    8          1

CodePudding user response:

Why not just groupby the first two indexes, rather than all three?

out = df.groupby(level=[0,1]).min()

Output:

>>> out
           value
self time       
0    1         2
     2         4
     3         1
1    1         5
     2         8
2    1         1
     2         1
  • Related