Please take this question lightly as asked from curiosity:
As I was trying to see how the slicing in MultiIndex works, I came across the following situation ↓
# Simple MultiIndex Creation
index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])
# Making Series with that MultiIndex
data = pd.Series(np.random.randint(10, size=6), index=index)
Returns:
a 1 5 2 0 c 1 8 2 6 b 1 6 2 3 dtype: int32
NOTE that the indices are not in the sorted order ie. a, c, b
is the order which will result in the expected error that we want while slicing.
# When we do slicing
data.loc["a":"c"]
Errors like:
UnsortedIndexError ----> 1 data.loc["a":"c"] UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'
That's expected. But now, after doing the following steps:
# Making a DataFrame
data = data.unstack()
# Redindexing - to unsort the indices like before
data = data.reindex(["a", "c", "b"])
# Which looks like
1 2
a 5 0
c 8 6
b 6 3
# Then again making series
data = data.stack()
# Reindex Again!
data = data.reindex(["a", "c", "b"], level=0)
# Which looks like before
a 1 5
2 0
c 1 8
2 6
b 1 6
2 3
dtype: int32
The Problem
So, now the process is: Series → Unstack → DataFrame → Stack → Series
Now, if I do the slicing like before (still on with the indices unsorted) we don't get any error!
# The same slicing
data.loc["a":"c"]
Results without an error:
a 1 5 2 0 c 1 8 2 6 dtype: int32
Even if the data.index.is_monotonic
→ False
. Then still why can we slice?
So the question is: WHY?.
I hope you got the understanding of the situation here. Because see, the same series which was before giving the error, after the
unstack
andstack
operation is not giving any error.
So is that a bug, or a new concept that I am missing here?
Thanks!
Aayush ∞ Shah
UPDATE:
I have used the data.reindex()
so to unsort that once more. Please have a look at it again.
CodePudding user response:
The difference between you 2 dataframes is the following:
index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])
data = pd.Series(np.random.randint(10, size=6), index=index)
data2 = data.unstack().reindex(["a", "c", "b"]).stack()
>>> data.index.codes
FrozenList([[0, 0, 2, 2, 1, 1], [0, 1, 0, 1, 0, 1]])
>>> data2.index.codes
FrozenList([[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
Even if your two indexes are the same appearance (values), the internal index (codes) are differents.
Check this method of MultiIndex
:
Create a new MultiIndex from the current to monotonically sorted
items IN the levels. This does not actually make the entire MultiIndex
monotonic, JUST the levels.
The resulting MultiIndex will have the same outward
appearance, meaning the same .values and ordering. It will also
be .equals() to the original.
Old answer
# Making a DataFrame
data = data.unstack()
# Which looks like # <- WRONG
1 2 # 1 2
a 5 0 # a 8 0
c 8 6 # b 4 1
b 6 3 # c 7 6
# Then again making series
data = data.stack()
# Which looks like before # <- WRONG
a 1 5 # a 1 2
2 0 # 2 1
c 1 8 # b 1 0
2 6 # 2 1
b 1 6 # c 1 3
2 3 # 2 9
dtype: int32
If you want to use slicing, you have to check if the index is monotonic:
# Simple MultiIndex Creation
index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])
# Making Series with that MultiIndex
data = pd.Series(np.random.randint(10, size=6), index=index)
>>> data.index.is_monotonic
False
>>> data.unstack().stack().index.is_monotonic
True
>>> data.sort_index().index.is_monotonic
True