say I have two dataframes with multiindices, where one of the indices is deeper than the other. Now I want to select only those rows from the one (deeper) dataframe where their partial index is included in the other dataframe.
Example input:
df = pandas.DataFrame(
{
"A": ["a1", "a1", "a1", "a2", "a2", "a2"],
"B": ["b1", "b1", "b2", "b1", "b2", "b2"],
"C": ["c1", "c2", "c1", "c1", "c1", "c2"],
"V": [1, 2, 3, 4, 5, 6],
}
).set_index(["A", "B", "C"])
df2 = pandas.DataFrame(
{
"A": ["a1", "a1", "a2", "a2"],
"B": ["b1", "b3", "b1", "b3"],
"X": [1, 2, 3, 4]
}
).set_index(["A", "B"])
Visual:
V
A B C
a1 b1 c1 1
c2 2
b2 c1 3
a2 b1 c1 4
b2 c1 5
c2 6
X
A B
a1 b1 1
b3 2
a2 b1 3
b3 4
Desired output:
result = pandas.DataFrame(
{
"A": ["a1", "a1", "a2"],
"B": ["b1", "b1", "b1"],
"C": ["c1", "c2", "c1"],
"V": [1, 2, 4],
}
).set_index(["A", "B", "C"])
Visual:
V
A B C
a1 b1 c1 1
c2 2
a2 b1 c1 4
I tried
df.loc[df2.index]
and df.loc[df.index.intersection(df2.index)]
but that does not work.
I guess I could do df.join(df2, how="inner")
and afterwards remove all the columns of df2
that were added, but that is cumbersome. Or is there a way to take away all the columns of df2
?
I would appreciate any help.
CodePudding user response:
One option is to use isin
on the specific labels common to both, and use the resulting boolean to filter df
:
df.loc[df.index.droplevel('C').isin(df2.index)]
V
A B C
a1 b1 c1 1
c2 2
a2 b1 c1 4