I have a pandas dataframe df
that I built using 3 levels of columns, as follows:
a1 a2 a3
b1 b2 b1 b3 b1 b4
c1 c2 c1 c2 c1 c2 c1 c2 c1 c2 c1 c2
... (data) ...
Note that each a
column may have different b
subcolumns, but each b
column has the same c
subcolumns.
I can extract e.g. the subcolumns from a2
using df["a2"]
.
How can I select based on the second or third level without having to specify the first and second level respectively? For instance I would like to say "give me all the c2
columns you can find" and I would get:
a1 a2 a3
b1 b2 b1 b3 b1 b4
... (data for the c2 columns) ...
Or "give me all the b1
columns" and I would get:
a1 a2 a3
c1 c2 c1 c2 c1 c2
... (data for the b1 columns) ...
CodePudding user response:
The docs provide some info on that. Adapting the examples from there to your example, either use tuples with slice
objects you pass None
,
df.loc[:, (slice(None), slice(None), "c2")]
or use pd.IndexSlice
to use the familiar :
notation:
idx = pd.IndexSlice
df.loc[:, idx[:, :, "c2"]]
CodePudding user response:
When you have a dataframe with a MultiIndex
the columns are a list of tuples containing the keys of each level.
You can get the columns you want with a list comprehension checking if the name you want is in the tuple:
# get all the "c2"
df[[col for col in df.columns if "c2" in col]]
# get all the "b1"
df[[col for col in df.columns if "b1" in col]]
If you want to be sure that the column name is in the right position:
# get all the "c2"
df[[col for col in df.columns if col[2] == "c2"]]
# get all the "b1"
df[[col for col in df.columns if col[1] == "b1"]]