I'm failing with accessing the values from multi-index/columns DataFrame by condition.
df = pd.DataFrame(np.arange(16).reshape(4,4),
index=[['a', 'a', 'b', 'b'], ["c", "d", "c", "d"]],
columns=[['cat1', 'cat1', 'cat2', 'cat2'], ['vals1', 'vals2', 'vals1', 'vals2']])
cat1 cat2
vals1 vals2 vals1 vals2
a c 0 1 2 3
d 4 5 6 7
b c 8 9 10 11
d 12 13 14 15
I know how to access values|subframes by their position e.g.
df.loc[("a", "c"), ("cat1", "vals1")]
df.loc[(slice(None), "d"), ("cat1", slice(None))]
BUT how to use the condition here
e.g. select the value in column ("cat1", "vals2") where ("cat2", "vals2")==7
CodePudding user response:
Use tuples in DataFrame.loc
:
a = df.loc[df[("cat2", "vals2")] == 7, ('cat1', 'vals1')]
print (a)
Last if need scalar from one element Series
:
out = a.iat[0]
If possible no match:
out = next(iter(a), 'no match')
You can compare sliced rows, columns - output is DataFrame filled by boolean - so for boolean Series
need test if any True per row by DataFrame.any
or all Trues per rows by DataFrame.all
:
m = df.loc[:, ("cat2", slice(None))]==7
a = df.loc[m.any(axis=1), ("cat1", "vals2")]
print (a)
a d 5
Name: (cat1, vals2), dtype: int32
m = df.loc[:, ("cat2", slice(None))]==7
df2 = df.loc[m.any(axis=1), ("cat1", slice(None))]
print (df2)
cat1
vals1 vals2
a d 4 5
CodePudding user response:
You can use loc
with boolean indexing:
df.loc[df[('cat2', 'vals2')].eq(7), ('cat1', 'vals2')]
output:
a d 5
Name: (cat1, vals2), dtype: int64
CodePudding user response:
A very similar approach to @mozway would be the use of pd.IndexSlice, as pandas docs suggests in the "Using Slices" section.
idx = pandas.IndexSlice
mask = df[('cat2', 'vals2')]==7
df.loc[idx[mask], idx['cat1', "vals2"]]