How to reset a single index in Pandas Multiindex?-CodePudding

I have a MultiIndex DataFrame built from multiple dataframes and I need to reset the first index. I don't want to change anything but the first index; the entire structure and order of everything else should be preserved.

What I currently have is something like this:

index = pd.MultiIndex.from_tuples([('0', 'Albert'),
                                   ('0', 'Isaac'),
                                   ('0', "Charles"),
                                   ('1', 'James'),
                                   ('1', 'Paul'),
                                   ('0', 'Wolfgang'),
                                   ('0', 'Enrico'),
                                   ('0', "John"),
                                   ('1', 'Marie'),
                                   ('1', 'Carol'),
                                   ('2', "Solomon"),
                                   ("2", "Joseph"),
                                   ("2", "Phil"),
                                   ('2', 'Danielle')],
                                  names=['class', 'name'])
columns = ('High', 'Average')
df = pd.DataFrame([(98.8, 97.9),
                   (100.0, 99.9),
                   (76.5, 64.2),
                   (99.3, 98.9),
                   (87.2, 83.3),
                   (98.8, 96.5),
                   (100.0, 97.7),
                   (88.6, 64.2),
                   (99.3, 78.3),
                   (87.2, 81.0),
                   (78.8, 65.9),
                   (99.0, 95.4),
                   (86.1, 74.7),
                   (97.9, 91.1)],
                  index=index,
                  columns=columns)

This gives a dataframe that looks like:

                 High  Average
class name                    
0     Albert     98.8     97.9
      Isaac     100.0     99.9
      Charles    76.5     64.2
1     James      99.3     98.9
      Paul       87.2     83.3
0     Wolfgang   98.8     96.5
      Enrico    100.0     97.7
      John       88.6     64.2
1     Marie      99.3     78.3
      Carol      87.2     81.0
2     Solomon    78.8     65.9
      Joseph     99.0     95.4
      Phil       86.1     74.7
      Danielle   97.9     91.1

I want to reset the "class" index only to start at 0 and iterate to the last entry in "class." The end result should look like this:

                 High  Average
class name                    
0     Albert     98.8     97.9
      Isaac     100.0     99.9
      Charles    76.5     64.2
1     James      99.3     98.9
      Paul       87.2     83.3
2     Wolfgang   98.8     96.5
      Enrico    100.0     97.7
      John       88.6     64.2
3     Marie      99.3     78.3
      Carol      87.2     81.0
4     Solomon    78.8     65.9
      Joseph     99.0     95.4
      Phil       86.1     74.7
      Danielle   97.9     91.1

I can't figure out how to do this; I've tried using reindex, set_levels; a few other things. I feel like this should be a simple thing to do, so I must be missing something in the options in a built in function (which would be my preferred method, but even something "hacky" at this point would suffice!).

CodePudding user response：

Create a new index from your first index level, then merge with the second and set it as new one:

import numpy as np

idx = df.index.get_level_values('class')
new_idx = pd.Index((idx != np.roll(idx, 1)).cumsum() -1, name='class')

df = df.set_index(pd.MultiIndex.from_arrays(
                      [new_idx, df.index.get_level_values('name')]))

>>> df
                 High  Average
class name                    
0     Albert     98.8     97.9
      Isaac     100.0     99.9
      Charles    76.5     64.2
1     James      99.3     98.9
      Paul       87.2     83.3
2     Wolfgang   98.8     96.5
      Enrico    100.0     97.7
      John       88.6     64.2
3     Marie      99.3     78.3
      Carol      87.2     81.0
4     Solomon    78.8     65.9
      Joseph     99.0     95.4
      Phil       86.1     74.7
      Danielle   97.9     91.1

Solution improved by @It_is_Chris

CodePudding user response：

One option is to reset index then create the groups out of it then reindex back

df2 = df.reset_index()
df2['class'] = (df2['class'] != df2['class'].shift()).cumsum() - 1
df = df2.set_index(['class', 'name'])
>>>
                 High  Average
class name
0     Albert     98.8     97.9
      Isaac     100.0     99.9
      Charles    76.5     64.2
1     James      99.3     98.9
      Paul       87.2     83.3
2     Wolfgang   98.8     96.5
      Enrico    100.0     97.7
      John       88.6     64.2
3     Marie      99.3     78.3
      Carol      87.2     81.0
4     Solomon    78.8     65.9
      Joseph     99.0     95.4
      Phil       86.1     74.7
      Danielle   97.9     91.1

CodePudding user response：

One option is

df = df.reset_index()
df['class'] = df['class'].astype(int).diff().abs().cumsum().fillna(0).astype(int)
df.set_index(['class', 'name'])

This also does not work with the index, so it is set before and after some changes in the class column