Home > Blockchain >  how to access dataframe index after concatenating 2 dataframes: one with multiindex, one without
how to access dataframe index after concatenating 2 dataframes: one with multiindex, one without

Time:09-17

I want to access a column/index in a dataframe that is a concatenation between 2 dataframes, one that has a multiindex, and an other that doesn't. Questions are inside the code.

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.ones((2, 2)), columns=["a", "b"])

df2_cols = pd.MultiIndex.from_tuples([("c", 1), ("d", 2)])
df2 = pd.DataFrame(data=np.ones((2, 2)), columns=df2_cols)

df = pd.concat([df1, df2], axis=1)
print(df)

Output:

     a    b  (c, 1)  (d, 2)
0  1.0  1.0     1.0     1.0
1  1.0  1.0     1.0     1.0

Now accesing different parts of the new Dataframe:

df1.loc[:, "a"]  # works
df.loc[:, "a"]   # works
df2.loc[:, ("c", 1)] # works
df.loc[:, ("c", 1)] # crashes -> is it possible to access this column using loc?

# even this crashes, where I am directly using the name provided by the dataframe column:
df.loc[:, df.columns[2]]

Error:

KeyError: "None of [Index(['c', 1], dtype='object')] are in the [columns]"

df[("c", 1)] # interestingly works

df = df.T
df.loc[("c", 1)] # crashes -> is it possible to access this index using loc?

I know I can use iloc or the option here: join multiindex dataframe with single-index dataframe breaks multiindex format, which makes sure that the multiindex format stays in the new dataframe. But wondering if it is possible without that.

CodePudding user response:

Seems like a bug to me.

You can cheat and use 2D slicing:

df.loc[:, [('c', 1)]]

Output:

   (c, 1)
0     1.0
1     1.0

You can assign correctly:

df.loc[:, [('c', 1)]] = [8,9]

Updated DataFrame:

     a    b  (c, 1)  (d, 2)
0  1.0  1.0       8     1.0
1  1.0  1.0       9     1.0

If you need a Series:

df.loc[:, [('c', 1)]].squeeze()

Output:

0    1.0
1    1.0
Name: (c, 1), dtype: float64
  • Related