After constructing a 3-d pandas dataframe I have difficulty accessing only specific columns.
The scenario is as follows:
head = ["h1", "h2"]
cols = ["col_1", "col_2", "col_3"]
heads = len(cols) * [head[0]] len(cols) * [head[1]] # -> ['h1','h1','h1','h2','h2','h2']
no_of_rows = 4
A = np.array(heads)
B = np.array(cols * len(head)) # -> ['col_1','col_2','col_3','col_1','col_2','col_3']
C = np.array([np.zeros(no_of_rows)] * len(head) * len(cols)) # -> shape=(6, 4)
df = pd.DataFrame(data=C.T,
columns=pd.MultiIndex.from_tuples(zip(A,B)))
yielding
h1 h2
col_1 col_2 col_3 col_1 col_2 col_3
0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0
Now I would like to get e.g. all col_1
, meaning col_1
of h1
and col_1
of h2
. The output should look like this
h1 h2
col_1 col_1
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
Any suggestions how I could access those two columns?
CodePudding user response:
You can use loc
with get_level_values(1)
, as your columns col1, col2, col3 are in the first level of your index:
>>> df.loc[:,df.columns.get_level_values(1).isin(['col_1'])]
h1 h2
col_1 col_1
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
If you want to grab all columns under h1
, you can set get_level_values(0)
, and grab h1
:
>>> df.loc[:,df.columns.get_level_values(0).isin(['h1'])]
h1
col_1 col_2 col_3
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
CodePudding user response:
You can use df.loc
with slice(None)
, as follows:
df.loc[:, (slice(None), 'col_1')]
or use pd.IndexSlice
, as follows:
idx = pd.IndexSlice
df.loc[:, idx[:, 'col_1']]
or simply:
df.loc[:, pd.IndexSlice[:, 'col_1']]
(Defining extra variable idx
for pd.IndexSlice
is useful as a shorthand if you are going to use pd.IndexSlice
multiple times. )
Result:
h1 h2
col_1 col_1
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
You can also do it with .xs()
as follows:
df.xs('col_1', level=1, axis=1)
Result:
h1 h2
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
Slightly different output without the repeating col_1
column labels shown.
The first 2 ways support selecting multiple columns too, e.g. ['col_1', 'col_3']
:
df.loc[:, (slice(None), ['col_1', 'col_3'])]
and also:
df.loc[:, pd.IndexSlice[:, ['col_1', 'col_3']]]
Result:
h1 h2
col_1 col_3 col_1 col_3
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0