I wonder if anyone could please offer some advice:
I have a data set with the following structure:
import pandas as pd
# Creates pandas DataFrame.
df1 = pd.DataFrame({'Col1': [1, 2, 3, 4], 'Col2': [99, 98, 95, 90]}, index=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame({'Col1': [1, 2], 'Col2': [99, 98]}, index=['A', 'B'])
df3 = pd.DataFrame({'Col1': [3, 4], 'Col2': [95, 90]}, index=['C', 'D'])
df_dict = dict(obj1=df1, obj2=df2, obj3=df3)
mDF = pd.concat(list(df_dict.values()), keys=list(df_dict.keys()))
mDF.rename_axis(index=["ID", "property"], inplace=True)
print(mDF)
These multi-index dataframes have different number of "property" rows:
Col1 Col2
ID property
obj1 A 1 99
B 2 98
C 3 95
D 4 90
obj2 A 1 99
B 2 98
obj3 C 3 95
D 4 90
For example, I would like to calculate the sum of Col1 values for property A and B or all IDs. However, this is only possible for obj ID which have both properties tabulated.
How could I slice this dataframe to those IDs which have the A and B proper
CodePudding user response:
Query works with named index:
mDF.query('property in ["A","B"]')
Output:
Col1 Col2
ID property
obj1 A 1 99
B 2 98
obj2 A 1 99
B 2 98
Formatting the query string can be tricky sometimes. If you don't feel comfortable with it, you can use Ben.T's suggestion with .index.get_level_values
.