I'm trying to concat two DataFrames, then access the concatenated result using the keys I provided. However, it doesn't seem to work as per the documentation, and I'm wondering what I'm doing wrong here.
df0 = pd.DataFrame([[1, 2], [3, 4]], columns=["col1", "col2"])
df1 = pd.DataFrame([[5, 7], [9, 11]], columns=["col1", "col2"])
dfcomb = pd.concat([df0, df1], keys=["df0", "df1"])
dfcomb
Out[82]:
col1 col2
df0 0 1 2
1 3 4
df1 0 5 7
1 9 11
dfcomb.index
Out[83]:
MultiIndex([('df0', 0),
('df0', 1),
('df1', 0),
('df1', 1)],
)
So far, so good. We've got a multilevel index. Now, according to the documentation:
One of the important features of hierarchical indexing is that you can select data by a “partial” label identifying a subgroup in the data. Partial selection “drops” levels of the hierarchical index in the result in a completely analogous way to selecting a column in a regular DataFrame.
So, I should be able to access the df0 rows using dfcomb["df0"], right? No!
dfcomb["df0"]
Traceback (most recent call last):
...
File "C:\Users\user\Anaconda3\envs\test_updatd_env\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'df0'
Why doesn't this work?
CodePudding user response:
The MultiIndex
in dfcomb
is in the index, not the columns, so you have to use .loc[]
instead of just []
:
>>> dfcomb.loc['df0']
col1 col2
0 1 2
1 3 4
CodePudding user response:
try using loc
df0 = pd.DataFrame([[1, 2], [3, 4]], columns=["col1", "col2"])
df1 = pd.DataFrame([[5, 7], [9, 11]], columns=["col1", "col2"])
dfcomb = pd.concat([df0, df1], keys=["df0", "df1"])
print(dfcomb)
label="df0"
print(dfcomb.loc[label])
output
col1 col2
0 1 2
1 3 4