Home > database >  How to access rows in a MultiIndex dataframe by using integer-location based indexing
How to access rows in a MultiIndex dataframe by using integer-location based indexing

Time:11-17

Suppose I have the following MultiIndex DataFrame, titled df:

arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
          ["one", "two", "one", "two", "one", "two", "one", "two"],]
tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

df = pd.Series(np.random.randn(8), index=index)

If I wanted to access all the rows associated with baz, for example, I would use cross-section: df.xs(('baz')).

But is there a way to access the rows by referencing the integer location in the first level, similar to iloc for single index DataFrames? In my example, I think that would be index location 1.

I attempted it with a workaround using .loc as per the following:

(df.loc[[df.index.get_level_values(0)[1]]]

But that returns the first group of rows associated with bar. Which I believe its because integer-location 1 is still within bar. I would have to reference 2 to get to baz.

Can I make it so that location 0, 1, 2, and 3 references bar, baz, foo, and qux respectively?

CodePudding user response:

You can use levels

df.xs(df.index.levels[0][1])
second
one   -1.052578
two    0.565691
dtype: float64

More details

df.index.levels[0][0]
'bar'
df.index.levels[0][1]
'baz'

CodePudding user response:

You could use ngroup with groupby and level=0:

df[(df.groupby(level=0).ngroup() == 1)] 
# where 1 is the index of the second group

Output:

first  second
baz    one       0.589972
       two      -0.040558
dtype: float64

Where, df.groupby(level=0).ngroup()

returns:

first  second
bar    one       0
       two       0
baz    one       1
       two       1
foo    one       2
       two       2
qux    one       3
       two       3
dtype: int64
  • Related