I have a MultiIndex DataFrame built from multiple dataframes and I need to reset the first index. I don't want to change anything but the first index; the entire structure and order of everything else should be preserved.
What I currently have is something like this:
index = pd.MultiIndex.from_tuples([('0', 'Albert'),
('0', 'Isaac'),
('0', "Charles"),
('1', 'James'),
('1', 'Paul'),
('0', 'Wolfgang'),
('0', 'Enrico'),
('0', "John"),
('1', 'Marie'),
('1', 'Carol'),
('2', "Solomon"),
("2", "Joseph"),
("2", "Phil"),
('2', 'Danielle')],
names=['class', 'name'])
columns = ('High', 'Average')
df = pd.DataFrame([(98.8, 97.9),
(100.0, 99.9),
(76.5, 64.2),
(99.3, 98.9),
(87.2, 83.3),
(98.8, 96.5),
(100.0, 97.7),
(88.6, 64.2),
(99.3, 78.3),
(87.2, 81.0),
(78.8, 65.9),
(99.0, 95.4),
(86.1, 74.7),
(97.9, 91.1)],
index=index,
columns=columns)
This gives a dataframe that looks like:
High Average
class name
0 Albert 98.8 97.9
Isaac 100.0 99.9
Charles 76.5 64.2
1 James 99.3 98.9
Paul 87.2 83.3
0 Wolfgang 98.8 96.5
Enrico 100.0 97.7
John 88.6 64.2
1 Marie 99.3 78.3
Carol 87.2 81.0
2 Solomon 78.8 65.9
Joseph 99.0 95.4
Phil 86.1 74.7
Danielle 97.9 91.1
I want to reset the "class" index only to start at 0 and iterate to the last entry in "class." The end result should look like this:
High Average
class name
0 Albert 98.8 97.9
Isaac 100.0 99.9
Charles 76.5 64.2
1 James 99.3 98.9
Paul 87.2 83.3
2 Wolfgang 98.8 96.5
Enrico 100.0 97.7
John 88.6 64.2
3 Marie 99.3 78.3
Carol 87.2 81.0
4 Solomon 78.8 65.9
Joseph 99.0 95.4
Phil 86.1 74.7
Danielle 97.9 91.1
I can't figure out how to do this; I've tried using reindex, set_levels; a few other things. I feel like this should be a simple thing to do, so I must be missing something in the options in a built in function (which would be my preferred method, but even something "hacky" at this point would suffice!).
CodePudding user response:
Create a new index from your first index level, then merge with the second and set it as new one:
import numpy as np
idx = df.index.get_level_values('class')
new_idx = pd.Index((idx != np.roll(idx, 1)).cumsum() -1, name='class')
df = df.set_index(pd.MultiIndex.from_arrays(
[new_idx, df.index.get_level_values('name')]))
>>> df
High Average
class name
0 Albert 98.8 97.9
Isaac 100.0 99.9
Charles 76.5 64.2
1 James 99.3 98.9
Paul 87.2 83.3
2 Wolfgang 98.8 96.5
Enrico 100.0 97.7
John 88.6 64.2
3 Marie 99.3 78.3
Carol 87.2 81.0
4 Solomon 78.8 65.9
Joseph 99.0 95.4
Phil 86.1 74.7
Danielle 97.9 91.1
Solution improved by @It_is_Chris
CodePudding user response:
One option is to reset index then create the groups out of it then reindex back
df2 = df.reset_index()
df2['class'] = (df2['class'] != df2['class'].shift()).cumsum() - 1
df = df2.set_index(['class', 'name'])
>>>
High Average
class name
0 Albert 98.8 97.9
Isaac 100.0 99.9
Charles 76.5 64.2
1 James 99.3 98.9
Paul 87.2 83.3
2 Wolfgang 98.8 96.5
Enrico 100.0 97.7
John 88.6 64.2
3 Marie 99.3 78.3
Carol 87.2 81.0
4 Solomon 78.8 65.9
Joseph 99.0 95.4
Phil 86.1 74.7
Danielle 97.9 91.1
CodePudding user response:
One option is
df = df.reset_index()
df['class'] = df['class'].astype(int).diff().abs().cumsum().fillna(0).astype(int)
df.set_index(['class', 'name'])
This also does not work with the index, so it is set before and after some changes in the class
column