Home > Enterprise >  Pandas dataframe multilevel indexing with concat - why doesn't this work?
Pandas dataframe multilevel indexing with concat - why doesn't this work?

Time:12-17

I'm trying to concat two DataFrames, then access the concatenated result using the keys I provided. However, it doesn't seem to work as per the documentation, and I'm wondering what I'm doing wrong here.

df0 = pd.DataFrame([[1, 2], [3, 4]], columns=["col1", "col2"])
df1 = pd.DataFrame([[5, 7], [9, 11]], columns=["col1", "col2"])
dfcomb = pd.concat([df0, df1], keys=["df0", "df1"])
dfcomb
Out[82]: 
       col1  col2
df0 0     1     2
    1     3     4
df1 0     5     7
    1     9    11

dfcomb.index
Out[83]: 
MultiIndex([('df0', 0),
            ('df0', 1),
            ('df1', 0),
            ('df1', 1)],
           )

So far, so good. We've got a multilevel index. Now, according to the documentation:

One of the important features of hierarchical indexing is that you can select data by a “partial” label identifying a subgroup in the data. Partial selection “drops” levels of the hierarchical index in the result in a completely analogous way to selecting a column in a regular DataFrame.

So, I should be able to access the df0 rows using dfcomb["df0"], right? No!

dfcomb["df0"]
Traceback (most recent call last):
...
 File "C:\Users\user\Anaconda3\envs\test_updatd_env\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'df0'

Why doesn't this work?

CodePudding user response:

The MultiIndex in dfcomb is in the index, not the columns, so you have to use .loc[] instead of just []:

>>> dfcomb.loc['df0']
   col1  col2
0     1     2
1     3     4

CodePudding user response:

try using loc

df0 = pd.DataFrame([[1, 2], [3, 4]], columns=["col1", "col2"])
df1 = pd.DataFrame([[5, 7], [9, 11]], columns=["col1", "col2"])
dfcomb = pd.concat([df0, df1], keys=["df0", "df1"])
print(dfcomb)

label="df0"
print(dfcomb.loc[label])

output

 col1  col2
0     1     2
1     3     4
​
  • Related