Home > Enterprise >  Pandas multiindex: select by condition
Pandas multiindex: select by condition

Time:04-11

I'm failing with accessing the values from multi-index/columns DataFrame by condition.

df = pd.DataFrame(np.arange(16).reshape(4,4),
                  index=[['a', 'a', 'b', 'b'], ["c", "d", "c", "d"]],
                  columns=[['cat1', 'cat1', 'cat2', 'cat2'], ['vals1', 'vals2', 'vals1', 'vals2']])
     cat1        cat2      
    vals1 vals2 vals1 vals2
a c     0     1     2     3
  d     4     5     6     7
b c     8     9    10    11
  d    12    13    14    15

I know how to access values|subframes by their position e.g.

df.loc[("a", "c"), ("cat1", "vals1")]
df.loc[(slice(None), "d"), ("cat1", slice(None))]

BUT how to use the condition here

e.g. select the value in column ("cat1", "vals2") where ("cat2", "vals2")==7

CodePudding user response:

Use tuples in DataFrame.loc:

a = df.loc[df[("cat2", "vals2")] == 7, ('cat1', 'vals1')]
print (a)

Last if need scalar from one element Series:

out = a.iat[0]

If possible no match:

out = next(iter(a), 'no match')

You can compare sliced rows, columns - output is DataFrame filled by boolean - so for boolean Series need test if any True per row by DataFrame.any or all Trues per rows by DataFrame.all:

m = df.loc[:, ("cat2", slice(None))]==7

a = df.loc[m.any(axis=1), ("cat1", "vals2")]
print (a)
a  d    5
Name: (cat1, vals2), dtype: int32

m = df.loc[:, ("cat2", slice(None))]==7
df2 = df.loc[m.any(axis=1), ("cat1", slice(None))]
print (df2)
     cat1      
    vals1 vals2
a d     4     5

CodePudding user response:

You can use loc with boolean indexing:

df.loc[df[('cat2', 'vals2')].eq(7), ('cat1', 'vals2')]

output:

a  d    5
Name: (cat1, vals2), dtype: int64

CodePudding user response:

A very similar approach to @mozway would be the use of pd.IndexSlice, as pandas docs suggests in the "Using Slices" section.

idx = pandas.IndexSlice

mask = df[('cat2', 'vals2')]==7

df.loc[idx[mask], idx['cat1', "vals2"]]
  • Related