Home > Enterprise >  pandas MultiIndex intersection on partial levels
pandas MultiIndex intersection on partial levels

Time:11-26

say I have two dataframes with multiindices, where one of the indices is deeper than the other. Now I want to select only those rows from the one (deeper) dataframe where their partial index is included in the other dataframe.

Example input:

df = pandas.DataFrame(
    {
        "A": ["a1", "a1", "a1", "a2", "a2", "a2"],
        "B": ["b1", "b1", "b2", "b1", "b2", "b2"],
        "C": ["c1", "c2", "c1", "c1", "c1", "c2"],
        "V": [1, 2, 3, 4, 5, 6],
     }
).set_index(["A", "B", "C"])
df2 = pandas.DataFrame(
    {
        "A": ["a1", "a1", "a2", "a2"],
        "B": ["b1", "b3", "b1", "b3"],
        "X": [1, 2, 3, 4]
     }
).set_index(["A", "B"])

Visual:

          V
A  B  C
a1 b1 c1  1
      c2  2
   b2 c1  3
a2 b1 c1  4
   b2 c1  5
      c2  6

       X
A  B
a1 b1  1
   b3  2
a2 b1  3
   b3  4

Desired output:

result = pandas.DataFrame(
    {
        "A": ["a1", "a1", "a2"],
        "B": ["b1", "b1", "b1"],
        "C": ["c1", "c2", "c1"],
        "V": [1, 2, 4],
     }
).set_index(["A", "B", "C"])

Visual:

          V
A  B  C
a1 b1 c1  1
      c2  2
a2 b1 c1  4

I tried df.loc[df2.index] and df.loc[df.index.intersection(df2.index)] but that does not work.

I guess I could do df.join(df2, how="inner") and afterwards remove all the columns of df2 that were added, but that is cumbersome. Or is there a way to take away all the columns of df2?

I would appreciate any help.

CodePudding user response:

One option is to use isin on the specific labels common to both, and use the resulting boolean to filter df:

df.loc[df.index.droplevel('C').isin(df2.index)]
 
          V
A  B  C    
a1 b1 c1  1
      c2  2
a2 b1 c1  4
  • Related