Suppose I have the following MultiIndex DataFrame, titled df
:
arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"],]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.Series(np.random.randn(8), index=index)
If I wanted to access all the rows associated with baz
, for example, I would use cross-section: df.xs(('baz'))
.
But is there a way to access the rows by referencing the integer location in the first level, similar to iloc
for single index DataFrames? In my example, I think that would be index location 1.
I attempted it with a workaround using .loc
as per the following:
(df.loc[[df.index.get_level_values(0)[1]]]
But that returns the first group of rows associated with bar
. Which I believe its because integer-location 1 is still within bar
. I would have to reference 2 to get to baz
.
Can I make it so that location 0, 1, 2, and 3 references bar, baz, foo, and qux respectively?
CodePudding user response:
You can use levels
df.xs(df.index.levels[0][1])
second
one -1.052578
two 0.565691
dtype: float64
More details
df.index.levels[0][0]
'bar'
df.index.levels[0][1]
'baz'
CodePudding user response:
You could use ngroup
with groupby
and level=0
:
df[(df.groupby(level=0).ngroup() == 1)]
# where 1 is the index of the second group
Output:
first second
baz one 0.589972
two -0.040558
dtype: float64
Where, df.groupby(level=0).ngroup()
returns:
first second
bar one 0
two 0
baz one 1
two 1
foo one 2
two 2
qux one 3
two 3
dtype: int64