I want to access a column/index in a dataframe that is a concatenation between 2 dataframes, one that has a multiindex, and an other that doesn't. Questions are inside the code.
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.ones((2, 2)), columns=["a", "b"])
df2_cols = pd.MultiIndex.from_tuples([("c", 1), ("d", 2)])
df2 = pd.DataFrame(data=np.ones((2, 2)), columns=df2_cols)
df = pd.concat([df1, df2], axis=1)
print(df)
Output:
a b (c, 1) (d, 2)
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
Now accesing different parts of the new Dataframe:
df1.loc[:, "a"] # works
df.loc[:, "a"] # works
df2.loc[:, ("c", 1)] # works
df.loc[:, ("c", 1)] # crashes -> is it possible to access this column using loc?
# even this crashes, where I am directly using the name provided by the dataframe column:
df.loc[:, df.columns[2]]
Error:
KeyError: "None of [Index(['c', 1], dtype='object')] are in the [columns]"
df[("c", 1)] # interestingly works
df = df.T
df.loc[("c", 1)] # crashes -> is it possible to access this index using loc?
I know I can use iloc or the option here: join multiindex dataframe with single-index dataframe breaks multiindex format, which makes sure that the multiindex format stays in the new dataframe. But wondering if it is possible without that.
CodePudding user response:
Seems like a bug to me.
You can cheat and use 2D slicing:
df.loc[:, [('c', 1)]]
Output:
(c, 1)
0 1.0
1 1.0
You can assign correctly:
df.loc[:, [('c', 1)]] = [8,9]
Updated DataFrame:
a b (c, 1) (d, 2)
0 1.0 1.0 8 1.0
1 1.0 1.0 9 1.0
If you need a Series:
df.loc[:, [('c', 1)]].squeeze()
Output:
0 1.0
1 1.0
Name: (c, 1), dtype: float64